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Abstract. We define a class of ranked tree automata TABG generalizing both the tree 
automata with local tests between brothers of Bogaert and Tison (1992) and with global 
equality and disequality constraints (TAGED) of Filiot et al. (2007) . TABG can test for 
equality and disequality modulo a given fiat equational theory between brother subterms 
and between subterms whose positions are defined by the states reached during a com- 
putation. In particular, TABG can check that all the subterms reaching a given state 
are distinct. This constraint is related to monadic key constraints for XML documents, 
meaning that every two distinct positions of a given type have different values. 

We prove decidability of the emptiness problem for TABG. This solves, in particular, 
the open question of the decidability of emptiness for TAGED. We further extend our 
result by allowing global arithmetic constraints for counting the number of occurrences 
of some state or the number of different equivalence classes of subterms (modulo a given 
flat equational theory) reaching some state during a computation. We also adapt the 
model to unranked ordered terms. As a consequence of our results for TABG, we prove 
the decidability of a fragment of the monadic second order logic on trees extended with 
predicates for equality and disequality between subtrees, and cardinality. 
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1. Introduction 



Tree automata techniques are widely used in several domains like automated deduction (see 
e.g. ICDG+07] ). static analysis of programs |BT05| or protocols |VGL07l lFGVTT04j . and 



XML processing |Sch07] . However, a severe limitation of standard tree automata (TA) is 
that they are not able to test for equality (isomorphism) or disequality between subterms 
in an input term. For instance, the language of terms matching a non-linear pattern such 
as f{x,x) is not regular (i.e. there exists no TA recognizing this language). Let us illustrate 
how this limitation can be problematic in the context of XML documents processing. XML 
documents are commonly represented as labeled trees, and they can be constrained by XML 
schemas, which define both typing restrictions and integrity constraints. All the typing 
formalisms currently used for XML are based on finite tree automata. The key constraints 
for databases are common integrity constraints expressing that every two distinct positions 
of a given type have different values. This is typically the kind of constraints that can not 
be characterized by TA. 

One first approach to overcome this limitation of TA consists in adding the possibility 
to make equality or disequality tests at each step of the computation of the automaton. 
The tests are performed locally, between subterms at a bounded distance from the current 
computation position in the input term. The emptiness problem, i.e. whether the language 
recognized by a given automaton is empty, is undecidable with such tests |Mon81j. A decid- 



able subclass is obtained by restricting the tests to sibling subterms [BT92] (see |CDG^07 
for a survey). 

Another approach was proposed more recently in [FTT071 IFTT08| with the definition 
of tree automata with global equality and disequality tests (TAGED). The TAGED do not 
perform the tests during the computation steps but globally on the term, at the end of the 
computation, at positions which are defined by the states reached during the computation. 
For instance, they can express that all the subterms that reached a given state q are equal, 
or that every two subterms that reached respectively the states q and q' are different. 
Nevertheless, arbitrary disequalities are not allowed in TAGED, since such q and q' must be 
different. The emptiness has been shown decidable for several subclasses of TAGED [FTT07[ 
IFTT08| . but the decidability of emptiness for the whole class remained a challenging open 
question. 

In this paper, we answer this question positively, for a class of tree recognizers more 
general than TAGED. We propose (in Section [3]) a class of tree automata with local con- 
straints between siblings and global constraints (TABG) which significantly extends TAGED 
in several directions: (i) TABG combine global constraints a la TAGED with local equality 
and disequality constraints between brother subterms a la [BT92] . (ii) the equality and 
disequality constraints are treated modulo a given flat equational theory (here flat means 
that both sides of the equation have the same variables and height, and that this height 
is bounded by 1), allowing to consider relations more general than syntactic equalities and 
disequalities, like e.g. structural equalities and disequalities, {in) testing global disequality 
constraints between subterms that reached the same state is allowed (such test specify key 
constraints, which are not expressible with TAGED), (iv) the global constraints are arbitrary 
Boolean combinations (including negation) of atomic equality and disequality (in TAGED, 
only conjunction of atoms are allowed, without negation). 
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In Section HJ we consider the addition to TABG of global counting constraints on the 
number \q\ of occurrences of a given state g in a computation, or the number ||q|| of dis- 
tinct equivalence classes (modulo the flat theory) of subterms reaching a given state g in a 
computation. These counting constraints are only allowed to compare states to constants, 
like in \q\ < 5 or ||q|| + 2||g'|| > 9 (with counting constraints being able to compare state 
cardinalities, like in \q\ = \q'\, the emptiness problem becomes undecidable) . Using this 
formalism as an intermediate step, we show that negative literals and disjunctions can be 
eliminated without loss of generality in the global constraints of TABG, i.e. that TABG whose 
global constraints are restricted to be conjunctions of positive literals (namely positive con- 
junctive TABG) have already the same expressiveness of the full TABG class. In particular, 
the counting constraints do not improve the expressiveness of TABG. 

Our main result, presented in Section [5l is that emptiness is decidable for positive 
conjunctive TABG (and hence for TABG). The decision algorithm uses an involved pumping 
argument: every sufficiently large term recognized by the given TABG can be reduced by an 
operation of parallel pumping into a smaller term which is still recognized. The existence 
of the bound for the minimum accepted term is based on a particular well quasi-ordering. 

We show that the emptiness decision algorithm of Section [5] can also be applied to 
a generalization of the subclass TAG of TABG without the local constraints computing on 
unranked ordered labeled trees (Section[6]). This demonstrates the robustness of the method. 

As an application of our results, in Section [7] we present a (strict) extension of the 
monadic second order logic on trees whose existential fragment corresponds exactly to TAG. 
In particular, we conclude its decidability. 

Related Work. TABG is a strict (decidable) extension of TAG and TA with local equality 
and disequality constraints, since the expressiveness of both subclasses is incomparable (see 
e.g. IJKVOOQ . 

The tree automata model of [BT92j has been generalized from ranked trees to unranked 
ordered trees into a decidable class called UTASC |WL07[ ILWODj . In unranked trees, the num- 
ber of brothers (under a position) is unbounded, and UTASC transitions use MSO formulae (on 
words) with 2 free variables in order to select the sibling positions to be tested for equality 
and disequality. The decidable generalization of TAG to unranked ordered trees proposed in 
Section [6] and the automata of [ WL071 ILW09| are incomparable. The combination of both 
formalisms could be the object of a further study. 

Another way to handle subterm equalities is to use automata computing on DAG rep- 
resentation of terms |Cha99l lANROSj . This model is incomparable to TAG whose constraints 
are conjunctions of equalities jJKV09j . The decidable extension of TA with one tree shaped 
memory ^CC05j can simulate TAG with equality constraints only, providing that at most one 
state per run can be used to test equalities |FTT07| . 

We show in Section H that the TABG strictly generalize the TAGED of [FTTOTj IFTTOS] . 
The latter have been introduced as a tool to decide a fragment of the spatial logic TQL 
[FTTOTj . Decidable subclasses of TAGED were also shown decidable in correspondence with 
fragments of monadic second order logic on the tree extended with predicates for subtree 
(dis)equality tests. In Section [71 we generalize this correspondence to TAG and a more 
natural extension of MSO. 

There have been several approaches to extend TA with arithmetic constraints on cardi- 
nalities \q\ described above: the constraints can be added to transitions in order to count 
between siblings [SSM03[ IDL06] (in this case we could call them local by analogy with 
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equality tests) or they can be global |KR02] . We compare in Section [4] the latter approach 
(closer to our settings) with our extension of TABG, with respect to emptiness decision. To 
our knowledge, this is the first time that arithmetic constraints on cardinalities of the form 
||g|| are studied. 

2. Preliminaries 

2.1. Terms, Positions, Replacements. We use the standard notations for terms and 
positions, see [ BN98j . A signature S is a finite set of function symbols with arity. We 
sometimes denote E explicitly as {/i : ai, . . . , : an} where /i, . . . , /„ are the function 
symbols, and ai, . . . , a„ are the corresponding arities, or as {/i, . . . , /„} when the arities 
are omitted. We denote the subset of function symbols of S of arity in as The set 
of (ranked) terms over the signature S is defined recursively as T(S) := {/(ti, • • • ,im) I 
f : m £ T,,ti, . . . ,tm & T(S)}. Note that the base case of this definition is {/ | / : € S}, 
which coincides with Eg by omitting the arity. Elements of this subset are called constants. 

Positions in terms are denoted by sequences of natural numbers. With A we denote the 
empty sequence (root position), and p.p' denotes the concatenation of positions p and p'. 
The set of positions of a term is defined recursively as Pos(/(ti, . . . , tm)) = {A} U {i.p \ i € 
{1, . . . ,m} A p € Pos{ti)}. A term t £ T(E) can be seen as a function from its set of 
positions Pos{t) into S. For this reason, the symbol labeling the position p in t shall be 
denoted by t{p). By p < p' and p < p' we denote that p is a proper prefix of p', and that 
p is a prefix of p', respectively. In these cases, p' is necessarily of the form p.p" , and we 
define p' — p as p" . Two positions pi,P2 incomparable with respect to the prefix ordering 
are called parallel, and it is denoted by pi || p2- The subterm of t at position p, denoted 
t\p, is defined recursively as t\x = t and f{ti, . . . ,tm)\i.p = ti\p. The replacement in t 
of the subterm at position p by s, denoted t[s]p, is defined recursively as t[s]\ = s and 
fih, . . . ,ti-i,ti,ti+i, . . . ,tm)[s]i.p = f{ti, . . . ,ti_i,ti[s]p,ti+i, . . . ,tm)- The height of a term 
t, denoted h{t), is the maximal length of a position of Pos{t). In particular, the length of 
A is 0. 

2.2. Tree automata. A tree automaton (TA, see e.g. jCDG+07] ) is a tuple A = {Q, S, F, A) 
where Q is a finite set of states, S is a signature, F <Z Q is a subset of final (or accepting) 
states and A is a set of transition rules of the form f{qi, . . . , Qm) — ^ Q where f : m £ Ti, 
qi, . . . , Qm, q £ Q- Sometimes, we shall refer to ^ as a subscript of its components, like in 
Qj( to indicate that this is the set of states of A. 

A run of ^ is a pair r = {t, M) where i is a term in T(S) and M : Pos{t) — > A_4 is a 
mapping satisfying the following statement for each p G Pos{t): if t\p is written of the form 
f{ti, . . . , tm), and M{p.l), . . . , M{p.m) are rules with right-hand side states qi, . . . ,qm € 
Q_A, respectively, then M{p) is a rule of the form f{qi, . . . ,qm) — >■ q for some q € Q_a- We 
write r[p) for the right-hand side state of M[p), and say that r is a run of A on t. Moreover, 
by term(r) we refer to t, and by symbol(r) we refer to t(A). The run r is called successful 
(or accepting) if r(A) is in F4. The language C{A) of A is the set of terms t for which there 
exists a successful run of A. A language L is called regular if there exists a TA ^ satisfying 
L = C{A). For facility of explanations, we shall use term- like notations for runs defined 
as follows in the natural way. For a run r = {t,M), by Pos{r) we denote Pos{t), and by 
h{r) we denote h{t). Similarly, by r\p we denote the run {t\p,M\p), where M\p is defined as 
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^\p{p') — M{p.p') for each p' in Pos{t\p), and say that r\p is a subrun of r. Moreover, for 
a run r' = {t',M') such that the states r'(A) and r{p) coincide, by r[r']p we denote the run 
{t[t']p,M[M']p), where M[M']p is defined as M[M']p{p.p') = M'{p') for each p' in Pos{t'), 
and as M[M']p{p') = M{p') for each p' with p ^ p'. 



2.3. Tree automata with local constraints between brothers. A tree automaton with 
constraints between brothers (defined in |BT92] and called TACBB in CDG"'"07] ) is a tuple 
A = {Q, S, F, A) where Q, S and F are defined as for TA, but with the difference that A is 

a set of constrained rules of the form f{qi,---,qm.) Q, where C is a set of equalities and 
disequalities of the form i ~ j or i 96 j for i,j G {!,..., m}. We call C a local constraint 
between brothers. By ta{A) we define the TA obtained from A by removing all constraints 
from A. 

A run of a TACBB ^ is a pair r = (t, M) defined similarly to the case of TA; f is a 
term in T(S) and the mapping M : Pos{t) — t- A_4 satisfies the following statement for each 
p € Pos{t): if t\p is written of the form f{ti, . . . ,tm), and M{p.l), . . . ,M{p.m) are rules 
with right-hand side states qi, . . . ,qm G Qyl, respectively, then M{p) is a rule of the form 

/(gi, . . . , qm) —J" q for some q S Q_a and constraint between brothers C. Moreover, for each 
equality i ^ j in C , ti = tj holds, and for each disequality i 96 j in C, tj 7^ tj holds. The 
notions of successful run and recognized language are defined for TACBB analogously to the 
case of TA. 



2.4. Term equations. Given a set of variables X, the set of (ranked) terms over S and 
X is defined as 7~(S U X) by considering arity for the elements of X. A substitution 
cr is a mapping from variables to terms a : X ^ T(S U X). It is also considered as a 
function from arbitrary terms to terms a : TiTiUX) — )■ T(SUA') by the recursive definition 
cr(/(ti, . . . ,tm)) = /(o'(ii)> • • • ,cr{tm)) for every function symbol / and subterms ti, . . . ,tm- 

An equation between terms is an unordered pair of terms denoted / ~ r. Given a set 
of equations E and two terms s, t, we say that s and t are equivalent modulo E, denoted 
s =E t, if there exist terms si, S2, ■ ■ ■ , Sn, n > 1 satisfying the following statement: s = si, 
Sn = t, and for each i G {1, . . . , n — 1}, there exists an equation I ^ r in E, a substitution a, 
and a position p, such that Si\p = a{l) and Sj+i = Si[a{r)]p. A flat equation is an equation 
I PS r where I and r are terms satisfying h(l) = h{r) < 1, and any variable x occurs in I if 
and only if x occurs in r. A flat theory is a set of flat equations. 

The following technical lemma shows that equivalence modulo a flat theory is preserved 
by certain replacements of subterms. It will be useful in Section [5j 

Lemma 2.1. Let E be a flat theory. Let s = /(si, . . . , s„), t = g{ti, . . . ,tm), s' = 
f{s'i, . . . , s^) and t' = g{t'i, . . . , t'^) be terms satisfying the following conditions: 

• For each i £ {1, . . . , n}, (sj G Sq € Hq) and (sj, s- G Sj =£; hold. 

• For each j G {1, . . . , m}, {tj G Sq 44> t'j G Sq) o^^c^ (^ii ^ '^0 ^ tj =e t'-) hold. 

• For each i G {1, . . . ,n} and j G {1, . . . [s[ =e t'j ^ Si =e tj) holds. 
Then, s =e t 4^ s' =e t' holds. 

Proof. We prove the left-to-right direction only. The other one is analogous by swapping 
the roles of s and t by the roles of s' and t', respectively. 
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Since s =e t holds, there exist terms ui,U2, ■ ■ ■ ,Uk,k > 1 satisfying the following 
statement: s = ui, = t, and for each i S {1, . . . ,k — 1}, there exists an equation Z ~ r in 
E, a substitution a, and a position p, such that Ui\p = a{l) and Wj+i = Ui[a{r)]p. 

We prove the statement by induction on k. For k = 1, s = t holds. Thus, g is f, m is 
n, and for each i € {1, . . . ,n}, Si = ti holds. In particular, each Si =e U holds. Therefore, 
each s[=Et[ also holds, and hence s' = f{s[, . . . , s'^) =e fit'i-, ■ ■ ■ it'n) — ^' holds. 

Now, assume k > 1. Let I ^ r, p and a be the rule, position and substitution satisfying 
ui\p = cr(/) and U2 = ui[a{r)]p. Recall that ui is s. First, suppose that p is not A. Then, p 
is of the form j.p' for some j € {1, . . . , n} and position p'. Note that U2\j =e holds, and 
for each i € {1, . . . , n} \ {j}, U2\i = ui\i holds. Thus, U2 is of the form f(vi, . . . , Vn) and for 
each i G {1, . . . , n}, Vi =e Si holds. Moreover, since E' is a flat theory, the step at p preserves 
the height, and hence, for each i G {1, . . . , n}, Vi £ T,q Si £ Sq and Vi, Sj E Sq Vi =e Si 
hold. From the statement of the lemma, the following conditions follow: 

• For each z G {1, . . . , n}, [vi G Sq <^=> G Sq) and (fj, s- G Eq =^ =£; s[) hold. 

• For each j G {1, . . . , m}, {tj G Sq t^- G Eq) and (ij, i^- G Sq t^) hold. 

• For each z G {1, . . . , n} and j G {1, . . . , m}, (s^ =e t'j ^ Vi =e tj) holds. 

By induction hypothesis, /(s'^, . . . , s'„) =£; (7(t'i, . . . , t'^) holds, and we are done. 

Now, consider the case where p is A. In this case s = ui = cr{l), and U2 = o"(r). Since E 
is a flat theory, / and r are of the form /(ai, . . . , and . . . , where either n, /i > 
or ra = = 0, and ai, . . . , a„, /3i, . . . , are either constants or variables. Moreover, a 
variable occurs in / if and only if it occurs in r. Note that (T{ai) = si, . . . ,a{an) = Sn 
holds. We call vi = (t(/3i), . . . = (j{l3^). Note that U2 = h{vi, . . . , v^). We define terms 
v'l^. . . ,v'^ as follows for each i in {1, . . . , /x}. If vi is a constant, then we define v[ as fj. 
Otherwise, if Vi is not a constant, then /3j is a variable x. Since £^ is a flat theory, some 
Uj (we choose any) must be x. In this case we define v[ as s'y With these definitions, the 
following conditions follow: 

• For each i G {1, . . . , /i}, {vi G Sq <^=> w • G Sq) and (fj, f • G Sq =^ =£; v[) hold. 

• For each j G {1, . . . , m}, (ij G Sq 44^ t^- G Sq) and (tj, t^- G Sq t'-) hold. 

• For each i G {1, . . . , ^} and j G {1, . . . , m}, (t;^ =e t'j 4^ Vi =e tj) holds. 
By induction hypothesis, h{v[, . . . , v'^) =e g{t[, . . . , t'^) holds. 

Now, let s'/, . . . , be defined as follows for each i in {1, . . . , n}. If is not a constant 
then define s'- as s[. Otherwise, if s[ is a constant, then define s'- as Sj. By the condition 
(sj, G So =^ Si =£ s'^ we have that /(s'^^, . . . , s'^) =e fis'i, • • • , s'^) holds. Moreover, the 
same rule / « r can be used to prove /(s'/, . . . , s'^) =e h{v'i, . . . , v'^). Hence, f{s'i, . . . , s^) =e 
f{s'{, . . . , =£; h{vi, . . . , =£; ^(t'l, . . . , t'^) holds, and we are done. □ 

2.5. Well quasi-orderings. A well quasi- ordering |Gal91] < on a set S" is a reflexive and 
transitive relation such that any infinite sequence of elements ei,e2,... of S contains an 
increasing pair < ej with i < j. 

3. Tree Automata with Global Constraints 

In this subsection, we define a class of tree automata with global constraints strictly gen- 
erahzing both the TACBB of |BT92] and the TAGED of [FTT08] . The generalization consists 
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in considering more general global constraints, and interpreting all the constraints modulo 
a flat equational theory. 

As an intermediate step, we define an extension of the TACBB of |BT92j where the local 
constraints between brothers are considered modulo a flat equational theory. 

Definition 3.1. A tree automaton with constraints between brothers modulo a flat theory 
(TAB) is a tuple A = {Q, E, F, A, E) where {Q, S, F, A) is a TACBB and £" is a flat equational 
theory. 

By ta{A) we denote S, F, A)). 

A run of a TAB A = {Q,Y^, F, A, E) is a pair r = {t,M) defined analogously to a run 
of a TACBB, except that the constraints between brothers are interpreted modulo E. More 
specifically, for each position p in Pos{t), if t\p is written of the form f{ti, . . . ,tm), and 
M{p.l), . . . , M{p.m) are rules with right-hand side states qi, ■ ■ ■ ,qm ^ Q, respectively, then 
M{p) is a transition rule of A_4 of the form . . . , Qm) Q for some q ^ Q and constraint 
between brothers C. Moreover, for each equality i ^ j in C, ti =e tj holds, and for each 
disequality i ^ j in C, ti tj holds. The notions of successful run and recognized language 
are defined for TAB analogously to the case of TA. 

We further extend this class TAB with global equality and disequality constraints gen- 
erahzing those of TAGED [FTT08] . 

Definition 3.2. A tree automaton with global and brother constraints modulo a flat theory 
(TABG) is a tuple A = {Q,T., F, A, E,C) where (Q, S, F, A, £") is a TAB, denoted tab{A), 
and C is a Boolean combination of atomic constraints of the form q ^ q' or q ^ q', where 
q,q' G Q. 

By ta{A) we denote ta{tab{A)). 

A run of a TABG A = {Q, S, F, A, E, C) is a run r = (t, M) of tab{A) such that r satisfies 
C, denoted r \= where the satisfiability of constraints is defined as follows. For atomic 
constraints, r \= q'^ q' (respectively r \= q ^ q') holds if and only if for all different positions 
p^p' € Pos{t) such that M{p) = q and M{p') = q\ t\p =e i|p' (respectively t\p ^e i|p') 
holds. This notion of satisfiability is extended to Boolean combinations as usual. As for TA, 
we say that r is a run of A ont. A run r of ^ on t € 7~(S) is successful (or accepting) if 
r(A) G F. The language C{A) of A is the set of terms t for which there exists a successful 
run of A. 

It is important to note that the semantics of -i(g ~ q') and q ^ q' differ, as well as the 
semantics of ^{q 96 q') and q ~ q' . This is because we have a "for all" quantifier in both 
definitions of semantics of q ~ and q ^ q'- 

Let us introduce some notations, summarized in Figure [T] that we use below to charac- 
terize some classes of tree automata related to TABG (Figure [J also refers to a class defined 
in Section H]). A TABG A is called positive if is a disjunction of conjunctions of atomic 
constraints and it is called positive conjunctive if is a conjunction of atomic constraints. 
The subclass of positive conjunctive TABG is denoted by TABG^. 

We recall that a TAB where all the constraints are empty is just a TA. For a TABG A, 
when the theory i?_4 is empty and tab{A) is just a TA, we say that A is just a tree automaton 
with global constraints (TAG). Its subclass with positive conjunctive constraints is denoted 
TAG^. 

With the notation TABG[ri,... ,rm], we characterize the class of tree automata with 
global and brother constraints modulo a flat theory whose global constraints are Boolean 



8 



L. BARGUNO, C. CREUS, G. GODOY, F. JACQUEMARD, AND C. VACHER 



TABG[«, 96, N] = TABG[Ri, 96] = positive TABG[Ri, 96] = TABG^[ss, 96] 




TA 



^ : effective strict inclusion _ : effective equivalence 

Figure 1: Decidable classes of TA with local and global constraints 

combination of atomic constraints of types ti , . . . , • The types ~ and 96 denote respec- 
tively the atomic constraints of the form q k, q' and q 96 q' , where q, q' are states. For 
instance, the abbreviation TABG used in Definition 13.21 stands for TABG[~, 96]. This notation 
is extended to the positive conjunctive fragment by TABG^[ti, . . . ,rfe] and to the fragment 
without local constraints between brother, by TAG[ri, . . . ,rfc]. 

3.1. Expressiveness. The class of regular languages is strictly included in the class of 
TABG languages due to the constraints. 

Example 3.3. Let S = {a : 0, / : 2}. The set {f{t,t) \ t G T(I1)} is not a regular tree 
language (this can be shown using a classical pumping argument). 
However, it is recognized by the following TAB: 

({9o,9f},S,{gf},{a qo,f{qo,qo) qoJ{Qo,Qo) Qf},^), 

and it is also recognized by the following TAG[~]: 

= ({go,gi,^f},s,{gf},{a qo \ qiJ{qo,qo) ^ qo I qi,f(.qi,qi) qf},9,qi ~ gi), 

where t ^ q \ qr is an abbreviation for t q and t qr- An example of successful run of 
^ on t = f{f{a,a),f{a,a)) is qf{qi{qo,qo),qi{qo,qo)), where we use term-like notation for 
marking the reached state at each position. 

Moreover, the TAGED of |FTT08| are also a particular case of TAG[~,96], since they can be 
redefined in our setting as restricted TAG^[~, 96], where the equational theory is empty, and 
where q and q' are required to be distinct in any atomic constraint of the form q ^ q' ■ 

Reflexive disequality constraints such as g 96 g correspond to monadic key constraints 
for XML documents, meaning that every two distinct positions of type q have different 
values. A state g of a TAG[~,96] can be used for instance to characterize unique identifiers 
as in the following example, which presents a TAG[~, 96] whose language cannot be recognized 
by a TAGED. This example will be referred several times in Section [5l in order to illustrate 
the definitions used in the decision procedure of the emptiness problem for TAG[~, 96]. 
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Figure 2: Term and successful run (Example I3.4p . 



Example 3.4. The TAG[~,96] of our running example accepts (in state qm) lists of dishes 
called menus, where every dish is associated with one identifier (state qid) and the time 
needed to cook it (state qt). We have other states accepting digits {qd), numbers {qN) and 
lists of dishes {qi)- 

The TAG[r^, 96] ^ = (Q, S, F, A, 0, C) is defined as follows: S = {0, . . . , 9 : 0, iV, Lq : 
2,L,M:3}, Q = {qd,qN,qid,qt,qL,qM}, F = {qu}, and A = {i ^ qd\qN\qid\qt ■ < i < Q} 
U {N{qd,qN) qN I qid I qt,Lo{qid,qt) qL, L{qid,qt,qL) ^ qL, M{qid,qt,qL) qAl}- 

The constraint C ensures that all the identifiers of the dishes in a menu are pairwise 
distinct (i.e. that qid is a key) and that the time to cook is the same for all dishes: C = 
qid ^ qid /\ qt ^ qt- A term in C{A) together with an associated successful run are depicted 
in Figure [21 

Althought this is a simple exercise, let us establish formally that TAG [~, 96] are strictly 
more expressive than TAGED. 

Lemma 3.5. The class of languages recognized by TAG^[^,^] strictly includes the class of 
languages recognized by TAGED. 

Proof. Since a TAGED is just a TAG'^[~, 96] where no constraint of the form q ^ q occurs, the 
inclusion holds. In order to see that it is strict, it suffices to show a language L which can 
be recognized by a TAG^[?a, 96] but not by a TAGED. 

Let S = {a : 0, s : 1, / : 2}. The set L of terms of r(S) of the form /(s"i(a), /(s"2(a), 
• • • ) /(s"*" (0^)1 o) • • such that k > and the natural numbers n^, for i < k, are pairwise 
distinct, is recognized by the following TAG'^[~, 96]: 



Assume that there exists a TAG^[~,96] A without refiexive disequality constraints of 
the form q ^ q (i.e. a TAGED), recognizing this language L. Then, there exists an accepting 
run r of .A on the term t = f{s{a), f{s^{a), . . . /(sl'5-^l"'"^(a), a) ...))€ L. Therefore, r \= 
(the global constraint of A, which is positive by hypothesis). 



There are two different positions pi = 2.2 2.1 and pj = 2.2 2.1, < i < j < 

IQ^I such that r{pi) = r{pj). Let us show that r' = T[r|p-]p^ is an accepting run of A on 



{qa,q,qi},^,{qi}A s{qa)^qa\q, 
[ f {q,qf) qt 
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t' = t[t\p.]py Since r{pi) = r{pj) and r is a run of A on t, r' is a run of ta[A) on t' . Hence, 
it suffices to prove that tlie constraint Cjs^ is satisfied by r' . Consider a position p of the 

form 2.2 2 with \p\ < j. We start by proving that any atomic constraint involving r'{p) 

is satisfied. Note that r'{p) = r{p) holds, and that the subterm t\p has only this occurrence 
in t. Thus, any atomic constraint involving r{p) and a state q occurring in r is necessarily 
of the form r{p) 96 q. Since any state occurring in r' occurs also in r, any atomic constraint 
involving r'{p) and a state q occurring in r' is of the form r'{p) 96 q. Moreover, the subterm 
t'\p has only this occurrence in t' . Thus, such a constraint is satisfied. Now consider two 
different positions pi,P2 which are not of the form described above. It remains to see that 
any atomic constraint involving r'{pi) and r'(p2) is satisfied. In the case where r'|p^ and 
r'|p2 are different, this is a direct consequence of the fact that both subruns r'|p^ and r'\p^ 
are also subruns of r at different positions. Otherwise, in the case where r'|p^ and r'jpj are 
the same subrun, then, r'(pi) = r'{p2) holds, and any atomic constraint involving r'{pi) and 
r'{p2) must be of the form r'{pi) k, r'{p2) because A has no reflexive disequalities. Thus, 
the atomic constraint is also satisfied in this case. □ 

The following example shows a TABG recognizing a language that cannot be recognized 
by a TAG[~, 96]. The proof is a simple exercise and it is left to the reader. 

Example 3.6. Assume that the terms of Example 13.41 are now used to record the activity of 
a restaurant. To this end, we transform the TAG of example 1 3 . 4 1 int o a TABG as follows. First, 
in order to simplify the example we omit the restriction that all cooking times coincide, i.e. 
C = qid ^ Qid- Second, we add a new argument of type qt to Lq, L and M, so that the 
old argument qt characterizes the theoretical time to cook, and the new qt characterizes 
the real time that was needed to cook the dish. Let us replace the transitions with Lq, L 
and M in input by Lo{qid,qt,qt) qi, Lo{qid,qt,qt) q'l-, L{qid,qt,qt,qL) qL, 

Liqid,qt,qt,qL) q'l, M{qid,qt,qt,qL) qM, M{qid,qt,qt,qL) where is 

a new state meaning that there was an anomaly. We also add a transition L{qid, qt^qt^q'i) ~^ 
q'^ to propagate and M{qid, qt^qt^qi) ^ q'u- 

By keeping the set of final states as {^Af}) the recognized language of the TABG obtained 
is the set of records well cooked, i.e. such that for all dishes, the real time to cook is equal to 
the theoretical time. By redefining the set of final states as {g^f}' recognized language 
is the set of records with an anomaly. 



3.2. Decision Problems. The membership is the problem to decide, given a term t € T(S) 
and a TABG A over Yi whether t G C{A). 

Proposition 3.7. Membership is NP-complete for TABG, by assuming that the maximum 
arity of the signature T, is a constant for the problem. 

Proof. In order to prove that this problem is in NP, given a TABG A = {Q,T,, F, A, E,C) 
and a term t £ 7'(S), we can non-deterministically guess a function M from Pos{t) into 
A, and check that {t, M) is a successful run of A on t. The checking can be performed 
in polynomial time. In particular, testing equivalence modulo E can be performed in 
polynomial time using a dynamic programming scheme, by assuming that the maximum 
arity of S is a constant of the problem, which is a usual assumption. More general results 
are given in |Nie96[ [CHJM] . For NP-hardness, [FTTOSl [JKVnO] present PTIME reductions 
of the satisfiability of Boolean expressions into membership for TAG^ [~] whose constraints 
are conjunctions of equalities of the form q ^ q. □ 
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Recall that for plain TA, membership is in PTIME. 

The universality is the problem to decide, given a TABG A over S, whether C{A) = T(S). 
It is known to be undecidable already for a small subclass of TAG. 

Proposition 3.8. |FTT08l IJKV09] Universality is undecidable for TAG^[^]. 

The following consequence is a new result for TAGED. 

Proposition 3.9. It is undecidable whether the language of a given TAG^[^] is regular. 

Proof. We show that universality is reducible to regularity using a new function symbol / 
with arity 2, and any non-regular language L which is recognizable by a TAG^[~] (such a 
language exists). 

Let A be an input of universality for TAG^[~] and let 

L' = {/(ti, t2) I h G r(s) A t2 G u {f{ti,t2) ! ti G ciA) A t2 G r(s)}. 

It is possible to compute a new TAG^[Ri] A' recognizing the language L' (see Lemma l4.19p . 
Thus, in order to conclude, it suffices to show that C{A) = T(S) if and only if C{A') is 
regular. For this purpose let us first define the quotient of a term language by a term 
s with respect to a function symbol /: R/s := {t \ f{s,t) G R}. This operation preserves 
regular languages: for all s and /, if R is regular then R/s is regular. 

If C{A) = T(S), then C{A') is {/(ti,t2) I ^1,^2 G T(S)}, which is regular. Assume 
that C{A) 7^ T(S) and let s € T(S) \ C{A). By construction, C{A')/s = L which is not 
regular. Hence C{A!) is not regular. □ 

The emptiness is the problem to decide, given a TABG A, whether C{A) = 0. The proof 
that it is decidable for TABG is rather involved and is presented in Section [5j 

4. Arithmetic Constraints and Reduction to TABG^ 

This section has two goals. The first goal is to present an extension of TABG by allowing 
certain global arithmetic constraints. They are interesting by themselves since they allow 
the representation of several natural properties in a simple way. The second goal is to 
show that the class of TABG languages coincides (in expressiveness) with the class of TABG^ 
languages. In other words, for each TABG there exists a TABG^ recognizing the same language. 
This reduction will be very useful in Section [S] in order to prove decidability of emptiness 
of TABG. 

The reason for presenting both results in the same section is that arithmetic constraints 
simplify the task of transforming a TABG into a TABG^ representing the same language. This 
is because negations can be replaced by arithmetic constraints with an equivalent meaning 
in a first intermediate step, and such constraints are easier to deal with. 

All this work is developed in Subsection 14.21 Before that, in Subsection 14. 1 1 we present 
a more general form of arithmetic constraints for which emptiness is undecidable. The 
motivation of this first subsection is to show the limits of positive results in this setting, 
and to justify the limited form of the constraints in Subsection 14.21 
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4.1. Global Integer Linear Constraints. Let Q be a set of states. A linear inequality 
over Q is an expression of the form • \q\ > a or Og • \\q\\ > a where every Uq 

and a belong to Z. We consider the above Hnear inequalities as atomic constraints of tree 
automata with global constraints, and denote by and their respective types. The 
type Z denotes and together. 

Using the notation introduced in Section^ TABG[«,96, j.j^, |l.|lz] (or TABG[r::, 96, Z]) de- 
notes the class of tree automata with global and brother constraints modulo a flat theory 
of the form A = {Q, S, F, A, E, C) such that (Q, S, F, A, E) is a TAB (denoted tah{A)) and 
C is a Boolean combination of atomic constraints which can be linear inequalities as above 
or equality or disequality constraints of the form qw q' oy q^ q', with q, q' € Q. 

Let ^ be a TABG[Ri, 96, Mz, IMiz] over S and with state set Q and flat equational theory E, 
let r be a run of tab{A) on a term t G T(S) and let q (z Q. Intuitively, the interpretation of |g| 
with respect to r is the number of occurrences of q in r, i.e. the number of positions p holding 
r{p) = q. The interpretation of |jg|| with respect to r is the number of different subterms 
(modulo E) in t reaching state q with r, i.e. the maximum number of positions pi,P2, ■ ■ ■ ,Pn 
holding r{pi) = r{p2) = ... = r{pm) = q and such that the terms t\p-^^t\p^, . . . ,t\p^ are 
pairwise different (modulo E). More formally, the interpretations oi\q\ and \\q\\ with respect 
to r (and t) are defined, respectively, by the following cardinalities: 

Ikllr = \{p\pePos{t) A r{p) = q}\ 
Ilklllr = \{[t\p]E\p(^Pos{t) A r{p)=q}\. 
This permits to define the satisfiability of linear inequalities with respect to r and t: r \= 
aq ■ \q\'> a holds if and only if Og • \ \q\ \r > a holds, and r \= aq ■ \\q\\ > a holds 

q&Q qeQ qeQ 

if and only if aq ■ [[ \\q\\ Jr ^ o holds. The satisfiability of the global constraint of A 

q&Q 

by r, denoted r \= is defined accordingly, and if r |= then r is called a run of A. A 
run of ^ on t G T(S) is successful (or accepting) if r(A) G F4. The language C{A) of A is 
the set of terms t for which there exists a successful run of A. 

Example 4.1. Let us add a new argument to the dishes of the menu of Example 13.41 which 
represents the price coded on two digits by a term N{di,do). We add a new state qp for 
the type of prices, and other states qcheap, Qmoderate, qexpensive, Qchic describing price level 
ranges, and transitions 0|1 qcheap, 2|3 qmoderate, 4|5|6 qexpensive, 7|8|9 qchic and 
^{1 cheap-, (Id) Qp, ■ ■ ■ ■ The price is a new argument of Lq, L and M, hence we replace 
the transitions with these symbols in input by Lo{qid,qt,qp) qi, L{Qid,qt,qp,qL) qi, 
Miqid,qt,qp,qL) qM- We can use a linear inequality \qcheap \ + kmoderatel - \qexpensive \ - 
\qchic\ > to characterize the moderate menus, and \qexpenswe \ + \qchic\ > 6 to characterize 
the menus with too many expensive dishes. A linear inequality \\qp\\ < 1 expresses that all 
the dishes have the same price. 

The class TAG[|.|z] has been studied under different names (e.g. Parikh automata 
in [KR02j . linear constraint tree automata in [BMSL09] ) and it has a decidable empti- 
ness test. Indeed, the set of successful runs of a given TA with state set Q is a context-free 
language (seeing runs as words of Q*), and the Parikh projection (the set of tuples over 

whose components are the | |g| Jr for every run r) of such a language is a semi-linear 
set. The idea for deciding emptiness for a TAG[[.[z] A is to compute this semi-linear set 
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and to test the emptiness of its intersection with the set of solutions in N'*^' of C^, the 
arithmetic constraint of A (a Boolean combination of linear inequalities of type \.\z) which 
is also semi-hnear. This can be done in NPTIME, see |BMSL0 9]. 

To our knowledge, the class TAG[ ||.||z] with global constraints counting the number of 
distinct subterms in each state, has not been studied, even modulo an empty theory. 

Combining constraints of type w and counting constraints of type |.|z however leads to 
undecidability. 

Theorem 4.2. Emptiness is undecidable for TAG^[^, \ -\z]- 

Proof. We consider the Hilbert's tenth problem, that is, solvability of an input equation P = 
where P is a polynomial with integer coefficients and variables ranging over the natural 
numbers. This problem is known undecidable, and with the addition of new variables it is 
easily reducible to a question of the form 3xi . . . 3xn '■ ei A . . . A e^, where 
variables ranging over the natural numbers, and ei, . . . , are equations that are either of 
the form Xj + Xk = xt or xj * Xk = xt or xj = 1 or Xj = 0. We reduce this last problem to 
emptiness of TAG^[«, \ 

We consider an instance ip = 3xi . . . 3xn : ei A . . . A e^- Without loss of generality, we 
assume that ei, . . . , em' for m' < m are all the equations of the form xj * Xk = xt, and that 
for each of such equations, the indexes j, k,t are different. We will construct a TAG^[r::, \ 
A such that if is true if and only if C{A) is not empty. 

Since the construction of A is technical, let us give first some intuitions (see Figure [3]). 
Consider a possible assignment xi := vi, . . . ,Xn '■= Vn- A concrete run of A will be able 
to check whether this assignment proves that (p is true, and only accept the corresponding 
term if the answer is positive. In this run, there will be vi occurrences of state q\xi\^ "^2 
occurrences of state 51x21) ^^"^ s° Equations of the form Xj + = xj, Xj = 1 and Xj = 
can directly be checked by constraints of the form \q\x^\ \ + \<l\xk\\ ~ l^|a;t|l) \Q\xj\\ = 1 ^ud 
\<l\xj\ \ = 0. 

For each equation of the form Xj*Xk = xt there will be Vk occurrences of a state called 
(?ei,|xfe|- This is ensured by the constraint |q'ei,|a;fc| I = kli^fcl I- Under each of these occurrences, 
there will be the same term, reaching a state qei,Xj, and containing Vj occurrences of a 
state qe^,\xt\- The uniqueness of this term, as well as the number of occurrences of qei,\xt\^ 
are both ensured by an equality constraint ~ Qei,Xj- In summary, there will be Vj * 
occurrences of state gei,|a;t|- The satisfiability of the equation Xj * x^ = xj will be checked 
by the constraint \q\^xt\ \ = \(le,,\xt\V 

The components of the TAG^[Ri, |.|z] A= {Q,T.,F,A,C) are defined as follows: 

Q = {^accept, Qa} U {q\xj\, q^^ | j G {1, . . . , n}} U {qe, \ i £ {I,... ,m'}}U 

{qei,xj, 9e„|xt|, Qe,,\xk\ | i G {1, • • ■,rn'},ei = xj * Xk = Xt} 
S = {a : 0, g : 1, h : 2, f : n + m'} 

F — {'Zaccept} 

A = {a^^a, /(&!,•••,&„, gei,---,9e„,) ^gacceptjU 

idiQa) q\x,\, aiia) qxj, 9{q\x,\) q\x,\, 9{q\x,\) qx, | j g {i, . . . ,n}}u 

{giqa) qei,\xt\, giQa) qe„x,, 9{qe„ \xt\) ~^ qei,\xt\i xt\) ~^ qei,Xj-, 

Hqei,Xj,qa) 9ei,|a;fe|) Hqe„Xj , qe„\xi,\) ^e„[a:fch Hqa, qe„\xi:\) ^ , 
Kqa, qa) qe, I i G {1, . . . , m'}, ei = Xj *Xk = Xt} 
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Figure 3: Accepting run of s = f{g'"^^^{a), ... ,5' 
Sgj, where is of the form Xj * = xt- 

^ ~ Am'<i<m,ei=Xj+Xf,=Xt\^\xj\\ ~^ \^\xk\\ 
f\m'<i<m.,ei=Xj=l \l\xj\ \ = 1 A 
/\m' <i<m,ei=x j=0 lllxj]] = A 
/\l<i<m' ,ei=Xj*xt,=xti\^ei,\xt\ \ ~ \^\xt\ 



^~^^{a), Sei, ■ ■ ■ , Se^,) and the subrun of 

-- \Q\xt\ \ A 

Ake„|xfe|l = \q\x,\\ ^ Qx, - qe,,x,) 



It remains to prove that (p is true if and only if C{A) is not empty. To this end, 
let us first assume that xi := := Vn is a solution of ip. In order to sim- 

plify the presentation, we denote the term h{a,h{s,h{s, . . . ,h{s,a) . . .))), with k occur- 
rences of s, by h[a, s, . . . (k) . . . , s,a], and given an equation = Xj * x^ = xt, we de- 
note the term h[a, g'"i^^{a), . . . (vk) ■ ■ ■ ,g'"^~^^{a),a] by Se,. Let us consider the term s = 
f{g^^~^^{a),...,g^"^^{a),Sej^,---,Se^,)- It is not difficult to see that the run of Figure [3] 
is an accepting run of s. Note that for each equation = Xj * x^ = Xt, the constraints 
\Qei,\xt\ \ = \Q\xt\\^ l9e„|xfc|l = \Q\xk\\, Qx, ~ qe,,xj are satisfied, since xj := vj, Xk := Vk, xt := 
vt satisfies the equation. 

Now, assume that there is an accepting run r of .4 on a term s. Since r is accept- 
ing, the transition rule f{qxi,- ■ ■ ,qx„,qei, ■ ■ ■ ,Qe^,) ^accept is applied at the root of s. 
According to the form of the rules involving Qx^, ■ ■ ■ , qx„ , it holds that s is of the form 
s = f{g^^~^^{a), . . . , g^"^^{a), Se^, . . . , Se^,), for some natural numbers vi, . . . ,Vn and some 
terms Sgn-.-jSe^/- Moreover, the states q\xi\T ■ ■ ,Q\xn\ have vi,...,Vn occurrences, respec- 
tively. It remains to see that the assignment xi := vi,...,Xn ■= Vn makes (f true. The 
satisfiability of a constraint of the form \q\x-\ \ + \q\x^.\ \ = \Q\xt\\ (o^ k|x |I = 1 or \q\x \ \ = 0) 
implies that vj + = vt (or Vj = 1 or Vj = 0), thus an equation of the form Xj + x^ = Xt 
(or Xj = 1 or Xj = 0) holds with this assignment. It remains to see that every equation ej 
of the form Xj * xj. = xt also holds with this assignment. According to the form of the rules 
of A and the satisfiability of the constraints |(7ei,|xfc|l = kl^fel I; Q'xj ~ Qe^,xj, the term Se, is 
of the form /i[a, 5'''^"'"^ (a), ... (wa;) 5^^^^ (a), a]. Moreover, |gej,|xt|l has occurrences. 
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Therefore, by the satisfiability of the constraint |(?ei,|a;t|l = k|xt||) it follows vj *Vk = vt, and 
hence the equation Xj * Xk = xt holds with this assignment, and we are done. □ 

4.2. Global Natural Linear Constraints. Wc present now a restriction on linear in- 
equalities which enables a decidable emptiness test when combined with a and 96 as global 
constraints. A natural linear inequality over Q is a linear inequality as above whose coefH- 
cients Uq and a all have the same sign. We call them natural since it is equivalent to consider 
inequalities in both directions whose coefficients are all non-negative, like ^ fflg • \q\ < a, 
with Uq, a € N, to refer to ^ — • \q\ > —a. We also consider linear equalities ^ Uq ■ \q\ = a, 
with ttq, a G N, to refer to a conjunction of two natural linear inequalities. 

The types of the natural linear inequalities are denoted by \.\fq and ||.||n- Below, we 
shall abbreviate these two types by N. 

The main difference between the linear inequalities of type |.|z and |.|n (and respectively 
||.||z and ||.||n) is that the former permits to compare the respective number of occurrences 
of two states, like e.g. in \q\ < \q'\, whereas the latter only permits to compare the number 
of occurrences of one state (or a sum of the number occurrences of several states with 
coefficients) to a constant as e.g. in |g| < 4 or \q\ + 2\q'\ < 9. 

In the rest of the subsection we show that TABG[Ri, 96, N] has the same expressiveness 
as TABG^[r^, 9^:]. The proof works in several steps: 

• First, we define the notion of normalized TABG[ss, 96, N], that is a TABG[?a, 96, N] with a 
constraint being a disjunction of conjunctions of literals in a simple form. 

• Second, we remove negative literals of the form -i{q ^ q') or -^{q 96 g'), obtaining a list 
of TABG^[f^,9^!,N] such that the union of their languages coincides with the language of 
the original TABG[Ri, 96, N]. In this step we use arithmetic constraints for simulating the 
removed negative literals. 

• Third, we remove arithmetic literals of type obtaining a new list of TABG^[Ri, 96, \.\^] 
such that the union of their languages coincides with the language of the original TABG[~ 
,9i,N]. In this step we use positive literals of types fs, 96, and in order to simulate 
the removed literals of type 

• Fourth, we remove arithmetic literals of type obtaining a new list of TABG^[?a,96] such 
that the union of their languages coincides with the language of the original TABG[~, 96, N]. 
In this step, new states are used for counting the amount of occurrences of original states. 

• Finally, we show that TABG^[Ri, 9^;] are closed under union. Hence, we obtain a single 
TABG^[Ri, 96] whose language coincides with the one of the original TABG[Ri, 96, N]. 

Definition 4.3. Let A = {Q,T., F, A, E,C) be a TABG[?5i, 96, N]. The constraint C is nor- 
mxilizeA if it is either true or false or a disjunction of conjunctions of literals, where all 
arithmetic literals are positive. 

Remember that the form of the positive arithmetic literals can be either ai||(/i|| -|- . . . -|- 
Qnlknll <^fc or ail^'il -|- . . . + an\qn\®k^ with (g) in {>, <, =}, n > 0, A; > and strictly positive 
ai, . . . , a„. 

Lemma 4.4. Any rj4B(?[Ri, 96, N] can he effectively transformed into a normalized TABC^k, 
,9^!,N] with the same equational theory and preserving the language. 

Proof. First, by applying dc Morgan laws, negations are moved inwards so that each nega- 
tion is applied to just an atom. Second, negative arithmetic literals are made positive by 
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simple transformations: inequalities are inverted and equalities become disjunctions of in- 
equalities. Third, strict inequalities are converted into non-strict by adding or subtracting 1 
to a side. Fourth, by applying simple arithmetic operations all such literals are made of the 
required form oi || -|- . . . -|- a„||g„|| (8" or ai|gi| -|- . . . -|-an|(7n| <8 for (8> in {>, <, =}, n > 
and strictly positive ai, . . . , a„. In this step, a trivially false literal is replaced by false, and 
a trivially true literal is replaced by true. Finally, by applying the standard transformation 
into disjunctive conjunctive normal form we get the desired result. □ 

In order to remove negative equality and disequality literals and positive arithmetic 
constraints, we use the idea of inserting new states which are synonyms of existing states. 
Intuitively, a synonym is a new state q that behaves analogous to an existing state q, i.e. 
the rules and constraints are modified such that the relation of q with the other states is the 
same as for q. Nevertheless, the constraints are further modified to ensure that, whenever 
q occurs in an execution, q also occurs. Moreover, all subterms reaching q are the same 
(or equivalent modulo the relation induced by the flat theory), but are different from (non- 
equivalent to) the ones reaching q. This way, an execution of the original automaton with 
occurrences of q can be transformed into an execution of the new automaton, where the 
occurrences of a concrete subterm (up to the equivalence relation) reaching q in the original 
execution now reach q instead. 

Definition 4.5. Let A = {Q, S, F, A, E, C) be a TABGfsa, 96, N]. Let g be a state in Q. Let 
(7 be a state not in Q. 

We define Fq^^ as F if g is not in F, and as F U {q} if q is in F. 

We define A^^^ as the set of rules obtained from the rules of A with all possible 
replacements of occurrences of q by q. More formally, A^^^ is {f {q'^, . . . , q'^) — )• q'^j^i \ 
^f{qi, ■■■,qn)^ Qn+l e A -.yi e {1, . . . ,n + 1} : {qi = q'^y {qi = q A q'i = q))}. 

We define Cq^q as the constraint ((||g|| = A ||^|| = 0) V {\\q\\ = I A q q)) A C , 
where C is obtained from the normalization of C by replacing each literal by a new formula 
according to the following description. 

• Each literal {qi ~ (72) is replaced by the conjunction of the literals of the set {q'l ^ q'2 \ 
{{q'l = 91 V (gi =qAq[= q)) A (g^ = q^M {q2 = q A q'^ = q)))]. 

• Each literal {qi 96 (72) is replaced by the conjunction of the literals of the set { Q'l 7^ 52 | 
((g'l = 91 V (gi =qAq[= q)) A (g^ = gg V (g2 = g A g^ = g)))}. 

• Each literal -i(gi k, q2) is replaced by the disjunction of the literals of the set {^{q'l ~ 
92) I ((^'i =91 V(gi =qAq[ = g)) A (g^ = g2 V (g2 = g A g^ = g)))} . 

• Each literal -i(gi 96 g2) is replaced by the disjunction of the literals of the set {-'(g^ 7^ 
92) I ((9'i = 91 V(gi = gAg; = g)) A (g^ = g2 V (g2 = g A g^ = g)))} . 

• Each occurrence of |g| is replaced by |g| -|- |g|, and each occurrence of ||g|| is replaced by 

+ Il9ll- 

We define Aq^q as {Q U {g}, S, Fq^q, Aq^q, E, Cq^q). 

We write {Fq^q)qi^qi for g 7^ g' and q ^ q' more succinctly as Fq^qi^q^qi, and similarly 

for Aq^qi^q^qi, Cq^qi^q^qi and Aq^qi^q^qt. 

The condition (||g|| = A ||g|| = 0) added to Cq^q is necessary to satisfy C{Aq.^q) = 
C{A), as it is proved in Lemma 14.61 This lemma is not used in the rest of the article, since 
the introduction of synonyms is combined with other constraints in further transformations. 
Nevertheless, we preserve Lemma 14.61 since its proof gives intuition about the definition of 
synonyms, and the arguments are similar to other ones appearing later. 
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Lemma 4.6. Let A = {Q,T,, F, A, E,C) be a TABG[!^,y&,N]. Let q be a state in Q. Let q 
be a state not in Q. 

Then, C{Ag^g) = C{A). 

Proof. Accepting runs of A having no occurrence of q are also accepting runs of Aq-^q. An 
accepting run of A having occurrences of q can be converted into an accepting run of Aq^q 
by choosing one subterm t reaching q and replacing g by g at all positions with subterms 
equivalent to t by the relation induced by E. 

Accepting runs of Aq^q can be converted into accepting runs of A by replacing each 
occurrence of g by q. □ 

The following lemma makes use of synonyms in order to remove a negative literal of 
the form ~ q') preserving the language. The next one, Lemma [4. 81 analogously permits 
to remove a negative literal of the form 96 q'). 

Lemma 4.7. Let A = {Q,^,F,A,E,C) be a TABG[^,^,'N]. Let q, q' be states in Q. 
Let q, q' be distinct states not in Q. Let C be of the form ~'(^ ~ q') A C . Let A! be 
{Q U {q, q'}, S, Fq^g,^q^q,,Aq^q^^ij^q^,E, {\\q\\ = 1 A \\q'\\ = I A q ^ q') A C'g q,^^^q,)- 
Then, C{A') = C{A) holds. 

Proof. Accepting runs of A can be converted into accepting runs of A' as follows. First, 
we choose two subterms t and P different modulo the equivalence relation induced by E 
and reaching q and q', respectively. Note that these terms must exist in order to satisfy 
the literal -■(g ~ q') of C. Second, we replace g by g at all the positions with subterms 
equivalent to thy the relation induced by E. Similarly, we replace q' by q' at all the positions 
with subterms equivalent to t' by the relation induced by E. This way, the subconstraint 
||g|| = 1 A = 1 A q ^ q' is satisfied, but also is satisfied. 

Accepting runs of A' can be converted into accepting runs of A by replacing each 
occurrence of q by q, and each occurrence of q' by q'. Note that the subconstraint \\q\\ = 
1 ^ Wq'W = 1 a g 96 ensures the existence of such occurrences, and with subterms which 
are different modulo the equivalence relation induced by E. Thus, the literal ~ q') of 
C is satisfied. The constraint C is also satisfied. □ 

Lemma 4.8. Consider the same assumptions as in Lemma \4. 7\ except that C is of the 
form 96 q') A C and the constraint of A' is {\\q\\ = 1 A ||g'|| = 1 A g ?a g') A C'^-,^^., 
Then, C{A') = C{A) holds. 

Proof. Analogous to the proof of Lemma 14.81 □ 
The following definition will be used to remove literals of type ||.||n- 

Definition 4.9. Let C be a constraint, and let /c be a natural number. By C||g||^fc we 
define the constraint obtained from C by replacing all occurrences of \\q\\ by k. 

The following two lemmas show how to remove literals of the form ||g|| = 1 or \\q\\ = 
preserving the language. 

Lemma 4.10. Let A = {Q,T,, F, A, E,C) be a TABGli^,^,^]. Let q be a state inQ. Let C 
be of the form \\q\\ = 1 A C". Let A' be {Q,T,, F, A, E,\q\ > 1 A g « ^ A C[|-||^-^). 
Then, C{A') = C{A) holds. 

Proof. Accepting runs of A' and A coincide because the constraints C and C^' have the 
same semantics. □ 
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Lemma 4.11. Let A = {Q, S, F, A, E, C) he a TABC{k,, 96, N] . Let q he a state in Q. Let C 
he of the form \\q\\ = A C". Let J\! he {Q, S, F, A, E, |g| = A C'^iqU^o)- 
Then, /:{A') = C{A) holds. 

Proof. Accepting runs of A' and A coincide because the constraints C and C^/ have the 
same semantics. □ 

Now, we will use the above lemmas in order to iteratively remove all negative literals 
and the arithmetic literals of type Each removal step is not defined for arbitrary nor- 

malized TABG[Ri, 96,N], but just for normalized conjunctive TABG[f«, 96, N]. For this reason, 

we first describe how to transform a given normalized TABG[~, 96, N] into a list of normal- 
ized conjunctive TABG[«, 96, N] such that, the union of their languages coincides with the 
language of the original TABG[pa, 96, N]. 

Definition 4.12. Let A = {Q,'E,F,A,E,C) be a normalized TABG[?a, 96, N], such that C 
is of the form Ci V C2 V . . . V C„ for conjunctive constraints Ci, C2, . . . , C„. Let Ai = 
{Q, S, F, A, E, Ci),A2 = {Q, S, F, A, ^, C2), . . . , A = {Q, S, F, A, E, C„). These automata 
are conjunctive and normalized and, moreover, C{A) = J~-{Ai) U C{A2) U . . . U C{An) holds. 
We say that Ai,A2,..., An is the suhdivision of A- 

Iteratively, we will transform a list of normalized conjunctive TABG[Ri, 96, N] into a new 
list of automata of the same kind but with simplified constraints, preserving the language. In 
order to show that this process terminates, we define a measure on normalized conjunctive 
TABG[Ki, 9^:, N] which will decrease at each step. Moreover, a case with minimal measure 
corresponds to a positive TABG[f«, 96, |.|n]. This measure is a pair of natural numbers which 
depends on the constraint C of the normalized conjunctive TABG[~, 96, N]. In the first 
component we have the amount of negative literals in C. In the second component we have 
the addition of the isolated constants in all arithmetic literal constraints of type ||.||n plus 
the number of uses of the function symbol ||.||n. 

Definition 4.13. We define the measure of a normalized conjunctive constraint C, denoted 
(C) as a pair of natural numbers. We describe it by distinguishing the following cases. 

• If C is of the form qi ^ q2 or qi ^ q2, then its measure is (0, 0). 

• If C is of the form -^{qi ~ 52) or ((71 96 52), then its measure is (1,0). 

• If C is of the form (ai ||q'i|| + . . . + a„||q'„|| (8) A;), where (8) is in {=, >, <}, then its measure 
is (0, n + k). 

• If C is of the form (ai|(/i| + . . . + a„|g„| ® k), where (8 is in {=, >, <}, then its measure 
is (0,0). 

• If C is either true or false, then its measure is (0,0). 

• If C is a conjunction of two or more literals li A I2 A . . . A In with measures (ai,6i), 
(02, ^2), • • • , {an, bn), then its measure is (ai + 02 + . . . + On, 61 + 62 + . . . + 6„). 

Let A = {Q,T,,F,A,E,C) be a normalized conjunctive TABG[f«, 96, N]. The measure of A, 
denoted {A) is defined as (C). 

We say that Ai is bigger than A2 (or, equivalently, that A2 is smaller than Ai), denoted 
Ai > A2 (or >l2 < ^1)) if the measure of Ai is bigger (or smaller) than the measure of A2, 
according to the lexicographic extension of the relation > of natural numbers. 

The following lemma shows that any normalized conjunctive TABG[p», 96, N] with non- 
minimal measure can be transformed into a list of TABG[Ri, 96, N] of the same kind with 
smaller measures and preserving the language. 
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Lemma 4.14. Let A = {Q,Ti,F,A,E,C) be a normalized conjunctive TABG[!^,^,N] whose 
measure is not (0,0). 

Then one can construct normalized conjunctive TABG[^, 96, N] Ai, . . . , An with the same 
equational theory E, each of them having a measure smaller than {A) and such that C{A) = 
C{Ai)U ...VJ C{An) holds. 

Proof. In the case where C has some negative hteral ^{q ^ q') or 76 g'), the transfor- 
mations described in Lemmas 14.71 and 14.81 give a new TABG[~,96,N] A! , and the subdivision 
A\^ . . . ,An of the normahzation of A' (as defined in Definition I4.12p is such that the con- 
straints C^j, . . . , have one less negative hteral than C. Thus, the measure of each of 
these automata is smaller than the measure of A. 

In the case where C has no negative literals of the form -i(g ~ q') or -i(g 96 q'), its 
measure is of the form (0, m) for m > 0. It follows that there is at least one literal of the form 
(^ll^ll+Z] ^i'lkill'^fc)) where (8> is in {=, >, <}. We consider a new state q and the automaton 
Aq^q. Its constraint Cq^q is of the form {{\\q\\ = A ||g|| = 0) V {\\q\\ = 1 A g 96 q)) A C". 
Note that, according to Definition 14.51 C is a conjunction because there are no negative 
literals of the form -■((? ~ q') or ^(q 96 q') in C. Thus, Cq^q can be rewritten as the 
disjunction of two conjunctions Ci and C2, where Ci is ||g|| = A ||g|| = A C" and C2 is 
\\q\\ = 1 Ag 96 qAC'. Hence, the subdivision of the normalization of Ag^g are the automata 
Ai , A2 obtained from Aq^g by replacing its constraint by Ci and C2 , respectively. The 
measures of Ci and C2 may be bigger than the one of C. In order to conclude, for each case 
we show that additional transformations can be applied to Ai and A2, producing automata 
with smaller measures than the one of A and preserving the represented language. 

• The literals of Ci of type ||.||n are \\q\\ = and ||g|| = 0, and those obtained from the 
literals of C of type ||.||n by replacing \\q\\ by ||g|| -|- ||g||. Note that original literals of the 
form (a||g|| + ^ Oj • ll^j II (S" /c) have been converted into (a||9|| + a||^|| + flj ■ ||gi|| <8) A;), and 
recall that there is at least one literal of this form in C. Applying to .Ai the transformation 
described in Lemma 14.111 for q and q, each one of the above literals is transformed into 
{a ■ + a ■ + Y^Ui ■ \\qi\\ ^ k), which has a smaller measure than the original literal 
(^ll^ll + ■ Ikill Moreover, the literals ||g|| = and \\q\\ = are converted into 
|g| = and \q\ = 0, respectively. In summary, the measure of ((Ci)g^o)(j'^o is smaller 
than the one of C. 

• Similarly, the literals of C2 of type ||.||n are ||g|| = 1 and those obtained from the literals 
of C of type ||.||n by replacing ||g|| by ||g|| + ||g||. As above, note that original literals of 
the form (a||g|| + WliW ^) have been transformed into (a||g|| + a||g|| + a, • \\qi\\ (8) 
k), and recall that there is at least one literal of this form in C. Applying to C2 the 
transformation described in Lemma 14.101 for q, each one of the above literals is converted 
into (a - ||g|| + a- 1-|-^ Oj • ll^j II (5D/c). The normalization of such a literal is the normalization 
of (a • ||g|| + Oj • \\qi\\ <^k — a), which might be already normalized or must be replaced 
by true or false in order to normalize it, depending on fc — a and 0. In every case, the 
resulting literal has a smaller measure than the original literal (a||g|| ' W^iW 
Moreover, the literal ||g|| = 1 is replaced by|(7|>l A q ^ q as a consequence of the 
transformation of Lemma 14.101 To summarize, the measure of (6*2)5^1 is smaller than 
the one of C. □ 

Corollary 4.15. LetA= {Q,T., F, A, E,C) be a r^BG[Ri, 96, N]. 

Then, one can construct some TABG'^[~, 96, Ai, . . . , An with the same equational 
theory E such that C{A) = CiAi) U . . . U £{An). 
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Proof. Without loss of generality, the constraint C can be assumed to be normalized. The 
subdivision of ^ is a collection of normalized conjunctive TABG[~, 96, N] such that the union 
of their languages coincides with C{A). 

By iterated application of the Lemma [4. 141 to each automaton of the subdivision, com- 
bined with the fact that the ordering on measures is well founded, we conclude to the 
effective existence of normalized conjunctive TABG[«,96,N] Ai, . . . , An such that C{A) = 
£(^1) U . . . U C{An) and each of them has measure (0, 0). This kind of automata are, in 
fact, TABG^[s=;, 96, \ .\^], since measure (0,0) implies that negative literals and literals of type 
||.||n do not occur. □ 

Now, in order to remove all arithmetic constraints, it remains to remove the ones of type 
|.|n. This is a rather easy task. For a given TABG^[~, 96, ^ we create a new TABG^[~, 96] 
A^ whose purpose is to simulate the computations of A. To this end, the states of A^ 
count the number of occurrences of the states of A in the simulated computation, up to a 
certain maximum value. This allows A^ to check the constraints of type |.|n A directly 
through states. Thus, each state of A^ is of the form qm for a state q of A and a mapping 
M : Qj[ — )■ N, that is, a mapping counting the number of occurrences of each state. 

Definition 4.16. LetA= {Q,T., F, A, E,C) be a normalized TABG^[Ri, 96, 

We define max^ as one plus the maximum isolated constant occurring in the literals of 
C of type |.|n, i.e. one plus the maximum constant k occurring in a literal of C of the form 
(ail^il + . . . + o„|g„| (g) k), for (g) in {>, <, =}. 

Given two mappings Mi : Q — ?> {0, . . . ,max^} and M2 ■ Q — )■ {0, . . . ,max_4}, the 
sum of Ml and M2 is defined as the mapping Mi + M2 : Q — )■ {0, . . . ,max^} satisfying 
(Ml + M2){q) = m±n{Mi{q) + M2(g),niax^). Given a state g in Q we define Mg : Q ^ 
{0, . . . ,inax^} as the mapping satisfying Mq{q) = 1 and Mq{q') = for all q' & Q \ {q}. 
We define A^ as the TABG^[f«, 96] {Q^, S, F^, A^, E, C^), where: 

• is {qM \ q e Q A M : Q ^ {0, . . . ,niax^}}. 

• F^ is {qM & \ q F AV(ai|gi| + ... + a„|g„| fc) G C, (g) G {>,<,=} : (aiM(gi) + 
... + a„M(g„) 

• is {/((gi)Mi, • • • , iqm)Mm) ^ qAh+...+Mm+Mq I ■■■,qm) ^ g) G A}. 

• is {m -mliq-q) ^c}u {^m T^mliq^^q) ^ c}- 

Lemma 4.17. Let A = {Q,T,, F, A, E,C) 6e a TABG^[?», 96, |.|n]. 
Then, C{Aj^) = C{A). 

Proof. The accepting runs of A can be converted into accepting runs of A^ and vice- versa, 
following the transformations described below. 

• A run of A^ can be converted into a run r of ^ by replacing each occurrence of a state 
qM by the corresponding state q. 

• A run r of ^ can be converted into a run of A^. The transformation can be defined 

recursively as follows. Let r be a run of the form {f{qi, . . . , qm) — > q){fi, • • • > ^m)- Let 
(ri)^}, . . . , {rm)^ be the transformations of ri, . . . , r^, and let {qi)Mi , {qm)Mm be the 

states reached by {ri)^, . . . , (rm)pj, respectively. Then, r^ is {f{{qi)Mi,- ■ ■ , {qm)M^) ^ 

qMi + ...+Mm+Mq){{ri)^, ... , (rjn)^). 

Each one of the two above transformations is the inverse of the other. Thus, they describe 
a bijection between runs of A and runs of A^. Moreover, for each run r of A, the state 
qM reached by holds that each q' £ Q satisfies M{q') = min(|r^^(g')|,maxyi) (note that 
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r~^{q') is the set of positions reaching state q'). Hence, by the definition of F^, it follows 
that g is in and r satisfies the arithmetic constraints of C if and only if qm is in F^. As 
a consequence, r is accepting if and only if is accepting. Thus, C{A^) = C{A) holds. □ 

The following corollary is a consequence of Corollary 14.151 combined with Lemma 14.171 

Corollary 4.18. LetA= {Q,J:, F, A, E,C) be a TABG{!^,^,N]. 

Then, one can construct some TABG^[~,96] Ai, . . . ,An with the same equational theory 
E such that C{A) = C{Ai) U . . . U C{An). 

As a final step, we show that TABG'^[~, 96] are closed under union for a fixed E. 

Lemma 4.19. Let Ai and A2 he TABG^[~,96] with the same equational theory E. Then, a 
TABG^[?a,96] A with the same equational theory E can he effectively constructed satisfying 
C{A) = C{Ai)\JC{A2). 

Proof. Let Ai be (Qi, S, Fi, Ai, Ci) and A2 be (Q2, S, F2, A2, F, C2). Without loss of 
generality we can assume that the sets of states Qi and Q2 are disjoint. 

In the case where Ci is just false the result follows by defining A := A2- Similarly, in 
the case where C2 is just false the result follows by defining A := ^1. From now on we 
assume that these cases do not take place. 

We define A as (Qi tt)Q2, S, Fi tt)F2, Ai tt) A2, F, Ci AC2). Note that A is a TABG'^[«, 96]. 
It is clear that any accepting run of A is also an accepting run of either Ai or A2- Moreover, 
it can be proved that any accepting run of either Ai or A2 is also an accepting run of A. 
We show this fact only for Ai, since the case for A2 is analogous. 

Let r be an accepting run of Ai. Then, r \= Ci holds. In order to see that it is, in 
fact, an accepting run of A, it remains to prove r \= C2- Since A2 is a TABG'^[Ri, 76], C2 is 
a conjunction of positive literals of type 96 applied to states of Q2- Therefore, r \= C2 
holds, since C2 is not false and any positive literal holds because r uses only states from 
Qi. □ 

Corollary 4.20. LetA= {Q,^, F, A, E,C) he a TABG{!^,^,N]. 

Then, one can construct a TABG^[^,^] A' with the same equational theory E such that 
C{A') = C{A). 

Corollary 4.21. The class of TABG languages (modulo the same equational theory) is closed 
under union. 

In order to complete the closure results for TABG languages under basic set operations, 
we show that they are also closed under intersection, but not under complementation. 

Lemma 4.22. The class of TABG languages (modulo the same equational theory) is closed 
under intersection. 

Proof. We use a classical Cartesian product of sets of states, with a careful redefinition of 
constraints on this product. 

More precisely, let Ai = (Qi, S, Fi, Ai, F, Ci) and A2 = (Q2, S, F2, A2, F, C2) be two 
TABG. We construct the TABG ^ = (Qi x Q2,S,Fi x F2, A,F,C) where A = {/((gi,i, ^2,1), 
• • • , {Qi,n, q2,n)) (91,^2) I f{qi,i, qi,n) qi e Ai for i £ {1, 2}} and the constraint C is 
obtained from C1AC2 by replacing every atom qi « q[ with qi,q'i € Qi (respectively q2 ~ q2 
with 52,92 e Q2) by Aga.q^eQa^^i' ^2) - (9i,92) (respectively Agi,gi6Qi (^i, ^2) ~ (gi,92))> 
and similarly for the atoms qi 96 q[, q2 ^ q2- With this construction, C{A) = C{Ai)riC{A2) 
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holds: the left (respectively right) projection of a successful run of ^ on a term t G T{T,) is 
a successful run of Ai (respectively A2) on t, and the product of two successful runs ri of 
Ai and r2 of A2, both on the same term t £ T(S), is a a successful run of ^ on t. □ 

Lemma 4.23. The class of TABG languages is not closed under complementation. 

Proof. To prove the statement it suffices to define a language L such that L is not recogniz- 
able by TABG but its complement L is. In order to simplify the presentation, we denote terms 
of theform/(5("i(a),/(c/"2(a)^. . . /(^r'^fe-i (a), 5-"* (a)) . . .)) simply with [ni,re2, . . . , n^-i, n^]. 
Let L be the language defined as: 

L = {[ni, ... ,nk] [ /c,ni, . . . ,nfc G N A 

G {1, . . . , A:} 3lj G {1, . . . , A;} \ {i} : = nj} 

In order to prove that L is not recognizable by TABG, by Corollary 14. 20^ it suffices to prove 
it for TABG^[~,96]. We proceed by contradiction assuming that there exists a TABG^[~,96] 
A such that C{A) = L. Let t G -L be the term [1, . . . , n, n, . . . , 1], where n > \Qa\^ and let 
r be an accepting run of A on t. By the pigeonhole principle, there exist i,j G {1, . . . ,n}, 

i-l i-1 

with i < j, such that the positions pi = 2 2 and pj = 2 2 satisfy r{pi) = r{pj). 

Let r' be the replacement r[r|p^]p.. Note that r' is an accepting run of ta{A) on the term 
[1, . . . , i — 1, j, . . . , n, n, . . . , 1], which is not in L. To conclude, it remains to prove that 
the constraints of A are satisfied in r' . First, note that this replacement only introduces 
new subterms at the positions P = {p G Pos{r) \ p < pi}. Moreover, the rules applied 
by r' at positions in P are the same as in r, and any constraint affecting a position in 
P in r is necessarily a disequality, since term(r|p) j^Ea ■term(r|p/) holds for p € P and 
p' G Pos{r) \ {p}. By the definition of r', necessarily term(r'|^) t^b^ term(r'|p') holds also 
for p £ P and p' G Pos{r') \ {p}. Therefore, r' satisfies all the constraints, and hence, r' is 
an accepting run of A, a contradiction. 

It remains to prove that L can be recognized by a TABG. We start by decomposing 
L into simpler languages. First, let Li be the language of the malformed terms, i.e. the 
terms over {f : 2, g : l,a : 0} that are not of the form [ni,...,nfc]. Second, let L2 be 
the language of the well- formed terms [ni, . . . , n^] such that for some i G {1, . . . ,k} there 
exists no j G {!,..., A:} \ {i} satisfying rij = nj. Third, let L3 be the language of the 
well- formed terms [ni, . . . , n^] such that there exist different ii, Z2, is G {1, . . . , A;} satisfying 
n-i^ = Ui^ = njg. It is easy to see that L = Li U L2 U L3. Moreover, note that Li can be 
recognized by a TA, L2 can be recognized by a TABG^[7!6, |.[p^] and L3 can be recognized by 
a TABG'^[f«, |.|n]. By Corollaries 14.201 and [4.211 this concludes the proof. □ 

5. Emptiness Decision Algorithm 

In this section we prove the decidability of the emptiness problem for TABG^ . As a conse- 
quence of this result and the results of Section U it follows the decidability of emptiness for 
TABG, and even more, of TABG[Ri, 96, N]. 

The decidability of emptiness for TABG'^ is proved in three steps. In Subsection 15. 11 we 
present a new notion of pumping which allows to transform a run into a smaller run under 
certain conditions. In Subsection 15. 2( we define a well quasi-ordering < on a certain set S. 
In Subsection 15. 3^ we connect the two previous subsections by describing how to compute, 
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Figure 4: Hi, Hi and Hi of Example 15.21 



for each run r with height h = h{r), a certain sequence eh, ■ ■ ■ ,ei of elements of S satisfying 
the following fact: there exists a pumping on r if and only if Cj < ej for some h > i > j > 1. 
Moreover, each of the computed sequence is chosen among a finite number of possibilities. 
Finally, all of these constructions are used as follows. Suppose the existence of an accepting 
run r. If r is "too high", the fact that < is a well quasi-ordering and the properties of 
the sequence imply the existence of such Thus, it follows the existence of a pumping 
providing a smaller accepting run r'. We conclude the existence of a computational bound 
for the height of a minimum accepting run, and hence, decidability of emptiness. 

5.1. Global Pumpings. Pumping is a traditional concept in automata theory, and in 
particular, it is very useful in order to reason about tree automata. The basic idea is to 
convert a given run r into another run by replacing a subrun at a certain position p in r 
by a run r' , thus obtaining a run r[r']p. Pumpings are useful for deciding emptiness: if a 
"big" run can always be reduced by a pumping, then decision of emptiness is obtained by 
a search of an accepting "small" run. 

For plain tree automata, a necessary and sufficient condition to ensure that r[r']p is 
a run is that the resulting states of r\p and r' coincide, since the correct application of a 
rule at a certain position depends only on the resulting states of the subruns of the direct 
children. In this case, an accepting run with height bounded by the number of states exists, 
whenever the accepted language is not empty. 

When the tree automaton has equality and disequality constraints, the constraints may 
be falsified when replacing a subrun by a new run. For TABG^, we will define a notion of 
pumping ensuring that the constraints are satisfied. This notion of pumping requires to 
perform several replacements in parallel. We first define the sets of positions involved in 
such kind of pumping. 

Definition 5.1. Let ^ be a TABG^. Let r be a run of A. Let i be an integer between 1 and 
h{r). We define 

Hi as {p G Pos{r) \ < h{r\p) = i}. 

Hi as {p.j G Pos(r) \ < h{r\p,j) < i A h{r\p) > i}, 

Hi as {p.j € Pos{r) \ = h{r\p,j) < i A h{r\p) > i}. 

Example 5.2. According to Definition 15.11 for our running example (Example 13. 4p . we 
have the Hi, Hi and Hi presented in Figured! 

The following lemma is rather straightforward from the previous definition. 

Lemma 5.3. Let A be a TABG^. Let r be a run of A. Let i be an integer between 1 and 
h{r). Then, any two different positions in HiU HiU Hi are parallel, and for any arbitrary 
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Figure 5: Global pumping of Example 15.51 



position p in Pos(r) there is a position p in Hi U Hi Li Hi such that, either p is a prefix of 
p, or p is a prefix of p. 

Proof. For the first fact , note that any proper prefix p of a position p in HiUHiU Hi satisfies 
h{r\p) > i. Thus, such a p is not in Hi U Hi U Hi. For the second fact, consider any p in 
Pos{r). If h{r\p) < i holds, then the smallest position p satisfying p < p and h{r\p) < i is 
in HiU HiL) Hi, and we are done. Otherwise, if h{r\p) > i holds, then the smallest position 
p of the form p.l 1 and satisfying h{r\p) < i is in Hi L) Hi U Hi, and we are done. □ 

Definition 5.4. Let ^ be a TABG^. Let E be Ej\^. Let r be a run of A. Let i,j be integers 
satisfying 1 < j < i < h{r). A pump-injection I : {Hi U Hi U Hi) (Hj U Hj U Hj) is an 
injective function such that the following conditions hold: 

(Ci) I{Hi) C Hj, I{Hi) C Hj and I{Hi) C Hj. Moreover, I restricted to Hi is the identity, 

i.e. I{p) = p for each p in Hi. 
(C2) For each pin HiU HiU Hi, r{p) = r{I{p)). 

(C3) For each pi,p2 in Hi U Hi U Hi, (term(r|pj =e terni(r|p2)) (term(r|7(p^)) =e 
term{r\j^p^))). 

Let {pi, . . . ,pn} be HiU HiU Hi more explicitly written. The run r[r\i(^p^)]p^ . . . [r\i(p,^)]p„ 
is called a global pumping on r with indexes i,j, and injection /. 

By Condition C2, '"[''|/(pi)]pi • • • [^l7(p„)]pn is a run of ta{A), but it is still necessary to 
prove that it is a run of A. By abuse of notation, when we write ?'['^|/(pi)]pi • • • ['^|7(p„)]p„! 
we sometimes consider that / and {pi, . . . ,pn} are still explicit, and say that it is a global 
pumping with some indexes 1 < j < i < h{r). 

Example 5.5. Following our running example, we define a pump- injection / : (i74 U i?4 U 
Ha) {H3 UH^U H3) as follows: /(I) = 1, /(2) = 2, 1(3) = 3.3. We note that / is a 
correct pump-injection: I{Hi) C ^^3, I{H4) C H3 and I{Hi) Q H3 hold, and / restricted 
to H/i is, in fact, the identity, thus (Ci) holds. For (C2), we have r(l) = r(/(l)) = qid, 
r(2) = r(/(2)) = qt, and r(3) = r(/(3)) = qi- Regarding (C3), for each different pi,P2 
in i?4 U i?4 U H4, term(r|p^) 7^ term(r|p2) and term(r|/(p^)) ^ term(r|/(p2)) hold. After 
applying the pump-injection /, we obtain the term and run r' of Figure [5j 

Our goal is to prove that any global pumping ?'[?'|/(pi)]pi • • • kl/{p„)]p„ is a run, and 
in particular, that all equality and disequality constraints are satisfied. To this end we 
first state the following intermediate statement, which determines the height of the terms 
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pending at some positions after the pumping. It can be easily proved by induction on the 
height of the involved term. 

Lemma 5.6. Let A be a TABG^. Let r be a run of A. Let r' be the global pumping 
r[r|/(p^)]p^ • • • ['^l/{p„)]pn ^''^ ^ with indexes 1 < j < i < h{r) and injection I. Let k > be a 
natural number and let p be a position of r such that h(r\p) is i + k. 
Then, p is also a position of r' and h{r'\p) is j + k. 

Proof. Position p is obviously a position of r' since no position in Hi U Hi L) Hi is a proper 
prefix of p. We prove the second part of the statement by induction on k. First, assume k = 
0. Then, h{r\p) is i. Thus, p is in Hi, say pis pi. Therefore, r'\p is r\j^p^y By Condition (Ci) 
of the definition of pump-injection, L{pi) £ Hj holds. Hence, h{r'\p) = = j. 

Now, assume A; > 0. Let m be the arity of symbol(r|p). Thus, p.l, . . . ,p.m are all the 
child positions of p in r. Since h{r\p) is i + k, all /i(r|p,i), . . . , h{r\p,m) are smaller than or 
equal to i + k — 1, and at least one of them is equal to i + k — 1. 

Consider any a in {1, . . . ,m}. If h{r\p^a) is i + k' for some < k' < k — 1, then, by 
induction hypothesis, h{r'\p,a) is j + k' . Otherwise, if h{r\p,ct) is strictly smaller than i, 
then p.a is one of the positions in Hi U Hi, say pi. In this case, r'\p^ is r|/(pj), and by 
Condition (Ci) of the definition of /, L{pi) belongs to Hj U Hj. Therefore, h{r\j(^p^'^) < j 
holds, and hence, h{r'\p,a) = ^('"'Ipi) = ^(^l7(pi)) < j < J + ^ — 1 holds. 

From the above cases we conclude that, if h{r\p,a) is i + k — 1, then h{r'\p,a) is j + k — 1, 
and if h{r\p^a) is smaller than i + k — 1, then /i(r'|p.a) is smaller than j + k — 1. It follows 
that all /i(r'lp.i), . . . , /i(r'|p.m) are smaller than or equal to j + k — 1, and at least one of 
them is equal to j + A; — 1. As a consequence, h{r'\p) is j + k. □ 

Corollary 5.7. Let A be a TABG^. Let r be a run of A. Let r' be a global pumping on r. 
Then, h{r') < h{r). 

The following lemma states that equality and disequality relations are preserved, not 
only for terms pending at the positions of the domain of /, but also for terms pending at 
prefixes of positions of such domain. Again, it is rather easy to prove by induction on the 
height of the involved terms. 

Lemma 5.8. Let A be a TABG^. Let r be a run of A. Let r' be the global pumping 
r[r|/(p^)]pj . . . with indexes 1 < j < i < h{r) and injection L . Letpi,p2 be positions 

of r satisfying that each of them is a prefix of a position in HiU HiU Hi. 

Then, pi,P2 are positions of r' and (terni(r|pj =e terni(r|p2)) {^^^^{r' \px) =E 
term(r'[p2)) holds. 

Proof. The first statement follows by Lemma [5. 61 We prove the second part of the statement 
by induction on /i(r|pj + h{r\p^). We distinguish the following cases: 

• Assume that both pi and p2 are positions in HiU HiU Hi, say pi and p2, respectively. 
Therefore, r'|pj is r\i{pi) and r'\p^ is r\j(^p^y By Condition (C3) of the definition of pump- 
injection, (term(r|p^) =e term(r|p2)) 44> (terni(r|/(p^)) =e terni(r|/(p2))) holds. Thus, 
(term(r|p^) =e ^^^(rlpj)) ^ (term(r'|pj =e term(r'|p2)) holds, and we are done. 

• Assume that one of pi or p2, say pi, is a proper prefix of a position in Hi U Hi U Hi, and 
P2 is a position in HiU HiU Hi. Then, h{r\p-^) = i + for some A; > 0, and /i(r|p2) < i 
holds. Thus, (terni(r|pj 'terni(r|p2)) holds. By Lemma [5?6l /i(r'|pj = j + k. By the 
definition of pump-injection, h{r'\p.^) < j. Thus, also (term(r'[pj ^e ■term(r'|p2)) holds, 
and we are done. 
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• Assume that both pi and p2 are proper prefixes of positions in HiU HiL) Hi. Note that, 
in this case, symbol(r'|p^) = symbol(r|pj) and symbol(r'|p2) = symbol(r|p2) hold. Let 
symbol(r|pj) and syiiibol(r|p2) be / and g, with arities n and m, respectively. Recall 
that / is the identity for the positions in Hi, and hence, a position a in {1, . . . , n} satis- 
fies symbol(r|p^,„) G Eg symbol(r'|p^ q,) G Sq, and symbol(r|p^,Q,), symbol(r'|p^c) G 
So =^ symbol(r|p^.Q,) = symbol(r'|p^.Q,). Similarly, a position /3 in {l,...,m} satis- 
fies symbol(r|p2.^) G Sq symbol(r'|p2./3) G Sq, and symbol(r|p2./3), synibol(r'|p2./3) G 
So =^ symbol(r|p2./3) = symbol(r'|p2./3). Moreover, since such positions pi.a and p2-(3 
are prefixes of positions in Hi U Hi U Hi, by induction hypothesis, (term(r|pj.Q,) =e 
terni(r|p2.^)) (terni(r'|p^.Q,) =e terni(r'|p2./3)) for all such a in {!,..., n} and /? in 
{!,..., m}. By Lemma \27l\ (term(r|pj) =£■ term(r|p2)) <J4> (terni(r'|pj) =e terni(r'|p2)) 
follows, and we are done. □ 

Now we prove that the result of a global pumping preserves the satisfaction of the global 
constraints. 

Lemma 5.9. Let A be a TABG^. Let r be a run of A. Let r' be the global pumping 
r[r|7(p^)]p^ • • • ['"l/(p„)]pn indexes I < j < i < h{r) and injection L . 
Then, r' satisfies all global constraints of A. 

Proof. Let us consider two different positions pi,P2 of Pos{r') involved in the constraint 
C^, i.e. either r'{pi) Ki r'{p2) or r'{pi) 96 r'{p2) occurs in C^. According to Lemma ISTSl we 
can distinguish the following cases: 

• Suppose that a position in HiL) HiU Hi, say pi, is a prefix of both pi,P2- Then, r'\p^ = 
i"\i{pi).{pi-pi) and r'|p2 = ^l/(pi).(p2-pi) hold. Hence, r'|p^ and r'|p2 are also subruns of r 
occurring at different positions. Thus, since r is a run, they satisfy the atom involving 
r'{pi) and r'{p2). 

• Suppose that two different positions in Hi U Hi U Hi, say pi and p2, are prefixes of pi 
and p2, respectively. Then, r'\p^ = '^|/(pi).(pi-pi) and r'|p2 = ^l/(j52).(p2-P2) ^old. By the 
injectivity of /, I{pi) / -^fe) holds. Moreover, by Lemma 15.31 L{pi) \\ I{p2) holds. 
Hence, as before, r'|pj and r'|p2 are subruns of r occurring at different (in fact, parallel) 
positions. Thus, they satisfy the atom involving r'{pi) and r'(p2). 

• Suppose that one of pi,P2, say is a proper prefix of a position in Hi U Hi Li Hi, and 
that p2 satisfies that some position in Hi Li Hi L) Hi is a prefix of p2. It follows that 
h{r'\p2) is smaller than or equal to j, and r'|p2 is also a subrun of r. Moreover, pi is 
also a position of r, r'{pi) = r{pi) holds, and h{r\p-^) = i + k holds for some fc > 0. 
Hence, terni(r|pj ■term(r'|p2) holds. Since r is a run and r'|p2 is a subrun of r, the 
atom involving r{pi) and r'{p2) is necessarily of the form r{pi) 96 r'{p2). Thus, the atom 
involving r'[pi) and r'{p2) is necessarily of the form r'(pi) 96 r'{p2). By Lemma 15.61 
/i(r'lpj) is j + k. Therefore, also term(r'|p^) j^e ■term(r'|p2) holds, and hence, such an 
atom is satisfied for such positions in r'. 

• Suppose that bothpi,p2 are proper prefixes of positions in HiLi HiU Hi. Then, pi,p2 are 
positions of r satisfying h{r\pj^) , h{r\p^) > i. Moreover, r{pi) = r'(pi) and r{p2) = r'{p2) 
hold. Since r is a run, the atom involving r{pi) and r(p2) is satisfied in the run r for 
positions pi and p2. By Lemma [5.81 ('tsrm(r|pj) =e terni(r|p2)) (term(r'|pj) =e 
term(r'|p2)) holds. Thus, the atom involving r'{pi) and r'{p2) is satisfied in the run r' for 
positions pi and p2. □ 
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Finally, we prove that the result of a global pumping preserves the satisfaction of the 
constraints between brothers. 

Lemma 5.10. Let A be a TABG^. Let r be a run of A. Let r' be the global pumping 
• • • kl/{p„)]pn indexes I < j < i < h{r) and injection L . 
Then, r' satisfies all constraints between brothers of A. 

Proof. Let us consider a position p of Pos{r') and two positions 11,12 involved in a constraint 
of the rule used at position p in r', i.e. either 7 = (i^ ?s 12) or 7 = (ii 96 12) occur in this 
constraint. According to Lemma l5.3t we can distinguish the following cases: 

• Suppose that a position in HiU HiU Hi, is a prefix of p. Then, r'\p is also a subrun of r. 
Thus, since r is a run, the constraint is satisfied. 

• Suppose that p is a proper prefix of a position in HiU HiU Hi. Then, p.ii and p.i2 are 
prefixes of positions in HiU HiU Hi. By Lemma [5.81 (term(r[p.jj) =e term(r|p.j2)) <^ 
(terin(r'lp.j^) =e term(r'[p.j2)) holds. Since r is a run, it follows that (term(r|p.j J =e 
term(r|p.j2)) <^ 7 = (i^ ^3). Thus, (term(r'|p.j J =e term(r'|p.j2)) <^ 7 = (zi f« 22) 
holds. Thus, the atom involving ii and Z2 is satisfied in the run r' for position p. □ 

As a consequence of the previous lemmas, we have that the result of a global pumping 
satisfies all constraints. 

Corollary 5.11. Let A be a TABG^. Let r be a run of A. Let r' be the global pumping 
^['^l-f(pi)]pi • • • ['^I/Cpti)]?!! '^'''^^ indexes I < j < i < h{r) and injection L. 
Then, r' is a run of A. 

5.2. A well quasi-ordering. In this subsection we define a well quasi-ordering. It assures 
the existence of a computational bound for certain sequences of elements of the correspond- 
ing well quasi-ordered set. It will be connected with global pumpings in the next subsection. 

Definition 5.12. Let < denote the usual quasi-ordering on natural numbers. Let n be a 
natural number. 

We define the extension of < to n-tuples of natural numbers as (xi, . . . , x„) < {yi, ■ ■ ■ ,yn) 
if Xi < yi for each i in {1, . . . ,n}. We define sum((a:i, . . . := xi -|- • • • -|- x„. 

We define the extension of < to multisets of n-tuples of natural numbers as [ei , . . . , Ca] < 
[e'l, . . . , e'p] if there is an injection / : {1, . . . , a} — > {1, . . . , /?} satisfying ei < e^^-^ for each i 
in {1, ... , q}. We define sum([ei, . . . , Cq]) := sum(ei) -|- • • • -|- sum(eQ,). 

We define the extension of < to pairs of multisets of n-tuples of natural numbers as 
{Pi, Pi) < (J^2, P2) if Pi < P2 and A < P2. 

As a direct consequence of Higman's Lemma jGalQl] we have the following: 

Lemma 5.13. Given n, < is a well quasi- ordering for pairs of multisets of n-tuples of 
natural numbers. 

In any infinite sequence ei, 62, . . . of elements from a well quasi-ordered set there always 
exist two indexes i < j satisfying < ej. In general, this fact does not imply the existence 
of a bound for the length of sequences without such indexes. For example, the relation 
< between natural numbers is a well quasi-ordering, but there may exist arbitrarily long 
sequences xi, . . . , of natural numbers such that Xi > xj for all 1 < i < j < /c. In order to 
bound the length of such sequences, it is sufficient to force that the first element and each 
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next element of the sequence are chosen among a finite number of possibilities. Indeed in 
this this case, by Konig's lemma, the prefix trees describing all such (finite) sequences is 
finite. As a particular case of this fact we have the following result (the proof is standard, 
but wc include it for completeness). 

Lemma 5.14. There exists a computable function i? : N x N ^ N such that, given two 
natural numbers a, n, B{a, n) is a bound for the length £ of any sequence (Ti, Ti), . . . , (T^, T^) 
of pairs of multisets of n-tuples of natural numbers such that the following conditions hold: 

(1) The tuple (0, . . . , 0) does not occur in any T^, Ti for i in {1, . . . ,£}. 

(2) sum(Ti) = 1 and sum(Ti) = 0. 

(3) For each i m {1, . . . , ^ — 1}, a ■ sum(ri) + sum(Tj) > suin(rj_|_i) + sum(rj_|_i). 

(4) There are no i,j satisfying 1 < i < j < £ and {Ti,fi) < {Tj,fj). 

Proof. For proving the statement, we first construct a rooted tree S = {V, E) labelled by 
sequences of pairs of multisets of n-tuples, where the depth of each node is equal to the 
length of the sequence labeling it and such that the set of internal nodes of S corresponds 
exactly to the set of sequences satisfying conditions (1) to (4). Second, we show that S is 
finite. This concludes the proof, since finiteness of S and its constructive definition imply 
that S is computable, and B{a,n) can be defined as the maximal depth of S. 

We define V as the set of all the sequences {T\,T\), . . . , {Ti,Tj() of pairs of multisets 
of n-tuples satisfying the conditions (1) to (3) and such that there are no i,j satisfying 
1 < i < j < £ and {Ti,Ti) < {Tj,Tj). This last condition, that we will refer to as (5), is 
weaker than (4) since in (5) we have j < £ instead of j < i. Thus, all sequences satisfying 
conditions (1) to (4) belong to V. Note that V contains the empty sequence, which we 
denote as £. We define E C V'^ as the set of edges containing (Ti, Ti ),..., (Tj, Tj) — y 
(Ti, Ti), . . . , {Ti,Ti), (Tj+i, Tj+i) for every such couple of sequences in V. 

It is quite obvious that S = {V, E) is a tree rooted at e, since e does not have an input 
edge, each sequence of length 1 has a unique input edge coming from e, and each sequence 
of length i > 1 has a unique input edge coming from its unique prefix sequence of length 
i — 1. Also, the set of internal nodes of S is exactly the set of sequences satisfying conditions 
(1) to (4), and the set of leaves of S is exactly the set of sequences satisfying conditions (1) 
to (3), and (5), but not (4). 

It remains to show that S is finite. To this end, it suffices to see that S is finitely 
branching and that there is no path with infinite length. 

First, we prove that each node v £V has a finite branching: e links to all the sequences 
of length 1, the number of which is bounded by conditions (1) and (2); and each sequence 
(Ti, Ti), . . . , {Ti,fi) can only link to sequences of the form (Ti, Ti), . . . , (Tj, Tj), (Tj+i, Tj+i), 
the number of which is bounded by conditions (1) and (3). 

Second, wc prove that there is no path with infinite length in S in a standard way. 
We proceed by contradiction by assuming that we have an infinite path vq,vi,V2,V3, . . . By 
construction, we have vq = e, and for all i > 1 and all j > i, the prefix of length i of the 
sequence vj is equal to fj. Consider the infinite sequence (Ti, Ti), (r2, T2), . . . where for all 
z > 1, (Tj, Tj) is the last element of the sequence Uj. Since < on pairs of multisets of n-tuples 
is a well quasi-ordering, there exist two indexes i,j satisfying i < j and {Ti,Ti) < {Tj,Tj). 
Hence, all sequences Vk for k > j do not satisfy condition (5), and hence they do not belong 
to V, contradicting the infiniteness of the path. □ 
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Figure 6: Multisets rn., rfj, and of Example 15.171 



In order to bound the height of a term accepted by a given TABG^ A (and of minimum 
height), Lemma 15.141 will be used by making a to be the maximum arity of the signature of 
A, and making n to be the number of states of A. 

5.3. Mapping a run to a sequence of the well quasi- ordered set. We will associate, 
to each number i in {1, . . . , h{r)}, a pair of multisets of n-tuples of natural numbers, which 
can be compared with other pairs according to the definition of < in the previous subsection. 
To this end, we first associate n-tuples to terms and multisets of re-tuples to sets of positions. 

Definition 5.15. Let ^ be a TABG^. Let E be Ej(. Let qi, . . . ,qn be the states of A. Let 
r be a run of A. Let P be a set of positions of r. Let t be a term. We define rt^p as the 
following tuple of natural numbers: E P \ term(r|p) =e t A r(p) = qi}\, . . . , |{p € P | 

term(r|p) =e t A r{p) = qn}\) 

Definition 5.16. Let ^ be a TABG^. Let E be EJ^. Let r be a run of A. Let P be a set 

of positions of r. Let {[ti], . . . , [tk]} be the set of equivalence classes modulo E of the set 
of terms {term(r[p) | p € P} with representatives ti, . . . We define rp as the multiset 
[ni,P,...,rt,.,p]. 

Example 5.17. Following our running example, for the representation of the re-tuples of 
natural numbers we order the states as {qd, Qn , Qid, Qt, Ql, Qm) ■ The multisets r//., r^. and 
r ^ are presented in Figure El 

The following lemma connects the existence of a pump-injection with the quasi-ordering 
relation. 

Lemma 5.18. Let A be a TABG^. Let r be a run of A. Let i,j be integers satisfying 
i < j < i < h{r). 

Then, there exists a pump-injection I : [Hi U Hi U Hi) (Hj U Hj U Hj) if and only if 
{rH,,rg^) < {ru.^rfj^). 

Proof. Although we prove both directions of the double implication, the left-to-right one is 
technical but not conceptually difficult, and it is not necessary for the rest of the paper. In 
the following, we write E for E_a. 

=^) Assume that there exists a pump-injection / : {Hi U HiU Hi) — )• {Hj U Hj U Hj). We 
just prove r//. < rn^, since rg, < r^j, can be proved analogously. By Condition (Ci) of 
the definition of pump-injection, I{Hi) C Hj holds. We write the equivalence classes of 
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{terin(r|j,) | p € Hi} and {term(r|p) | p G Hj} modulo E more explicitly as • • • , [ti,a]} 

and {[tj^i], [tj,^]}, respectively. Hence, it remains to prove that [n. . . ■ ,rt.^,ffj < 
[rtj^i,Hj, ■ ■ ■ ,rtj i^,Hj]- To this end we define the function /' : {!,...,«} — > as 
follows. For each 7 in {1, ... , a}, we choose a position p in Hi satisfying teriii(r|p) =e U^^, 
determine the index S of the term tj^s satisfying tj^s =E ■term(r|/(p)), and define /'(7) := S. 
This function /' is injective due to Condition (C3) of the definition of pump-injection. In 
order to conclude, it suffices to prove rt-^^Hi < T~t^ jn^^^^,Hj for each 7 in {!,..., q}. We 
just prove it for 7=1. For proving rt^^^Hi ^ 'rt.^i^^^^Hj it suffices to prove the following 
statement for each state q of A: \{p G Hi | teriii(r|p) =e A r{p) = q}\ < \{p € Hj \ 
terin(r|p) =e tj,i'(i) ^r{p) = q}\. 

To this end, since / is injective, it suffices to prove that I{{p G Hi \ term(r|p) =e 
ti,i A r{p) = q}) is included in {p G Hj \ term(r|p) =e tj,r(i) A r{p) = q} for each state q of 
A. Thus, consider any p of {p & Hi \ teriii(r|p) =e A r{p) = q}. Let p' be the chosen 
position for defining /'(I). In particular, terin(r|p/) =e and term(r|7(p/)) =e hold. 
Note that terni(r|p) =e term(r|p/) =e ijj holds. Thus, by Condition (C3) of the definition 
of pump-injection, term(r|7(p)) =e term(r|7(p/)) holds. Therefore, term(r[/(p)) =e 
holds. In order to show the inclusion I{p) G {p G Hj \ terin(r|p) =e /\r{p) = q} 

it remains to see r{I{p)) = q. Note that, since p belongs to {p ^ Hi \ term(r|p) =e 
ti,i A r{p) = q}, r{p) = q holds. By Condition (C2) of the definition of pump-injection, 
r{I{p)) = r{p) = q holds, and we are done. 

<^=) Assume that [rHi-.i'^^) < {rHj,i'fj^) holds. We have to construct a pump-injection 
/ : (Hi U HiU Hi) (Hj U Hj U Hj). By the definition of pump-injection, the restriction 
I : Hi ^ Hj must be defined as the identity, which is not a problem since Hi is always 
included in Hj. Conditions (C2) and (C3) are satisfied for free for these positions. Moreover, 
for positions p'^ G HiUHi andp2 £ Hi, Condition (C3) holds whenever Condition (Ci) holds 
since in this case term(r|p/^) ^e 'term(rlp^) and term(r|/(-p/ )) ^e ^^^'^i'''\i{p'2)) hold. 

Hence, it remains to define / : {Hi U Hi) — > {Hj U Hj). We just define I : Hi ^ Hj 
and prove Conditions (C2) and (C3) for p,pi,P2 in Hi. This is because I : Hi Hj 
can be defined analogously, and Conditions (C2) and (C3) for the corresponding posi- 
tions can be checked analogously. Moreover, for positions p'^ G Hi and p'2 G Hi, Condi- 
tion (C3) holds whenever Condition (Ci) holds since in this case terin(r|p/^) ^e ^^^^{Ap'2) 
and terin(r|7(p/-)) ^e 'term(r|7(p^)) hold. Hence, this simple case is enough to prove the 
whole statement. 

We write the set of equivalence classes of {term(r|p) | p G Hi} and {term(r|p) | p G Hj} 
modulo E more explicitly as . . . , [ti,a]} and • • • , [ij,/?]}, respectively. Since 

{fHi,rfj.) < {rHj,rfj.) holds, rif. < rHj also holds. Thus, there exists an injective function 
I' : {1, . . . , a} ^ {1, . . . , /3} satisfying the following statement for each (5 in {1, . . . , a} and 
each state g' of ^: \{p € Hi \ term(r|p) =e U^s A r{p) = q}\ < \{p & Hj \ term(r|p) =e 
tj,i'{S)^r{p) = q}\ (t). 

In order to define I : Hi ^ Hj, we define / for each of such sets {p e Hi \ term(r|p) =e 
U,S A r{p) = q} as any injective function I : {p £ Hi \ terni(r|p) =e ti^s A r{p) = q} ^ {p E 
Hj I term(r|p) =e tj,i'{S) A r{p) = q}, which is possible by the above inequality (f). The 
global I is then injective thanks to the injectivity of I' . Conditions (C2) and (C3) trivially 
follow from this definition. □ 
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Example 5.19. Following our running example, we first prove {rUi^ff^^ < {rH-j,rg,). 
To this end just note that [(0,0,0,0,1,0)] < [(0,0,0,0,1,0)] and that [(0, 0, 0, 1, 0, 0)] '< 
[(0, 0, 0, 2, 0, 0)] hold. We can define I -.{H^UHiU Ha) {H3 U ^3 U H3) from this relation 
according to Lemma 15.181 Doing the adequate guess we obtain the following definition: 
/(I) = 1, /(2) = 2, /(3) = 3.3 which is the pump-injection considered in Example 15.51 for 
our running example. 

The following lemma follows directly from the definition of the sets Hi and Hi, and 
allows to connect such definitions with Lemma 15.141 

Lemma 5.20. Let A be a TABG^ . Let a be the maximum arity of the symbols in the signature 
of A. Let r be a run of A. Then, the following conditions hold: 

(1) \Hh{r)\ = 1 and \Hh{r) \ =0. 

(2) For each i in {2, . . . , h{r)}, a ■ \Hi \ + \Hi\ > \Hi-i\. 

(3) For each i in {1, . . . , h(r)}, \Hi\ = sum(r/f.) and \Hi\ = s\im{rff,). 

Proof. Item (1) is trivial by definition of Hi and Hi for i = h{r). For Item (2), it suffices to 
observe that the positions in Ui7j_i are all the positions in Hi plus a subset of all child 
positions of positions in Hi, and that each position has at most a children. For Item (3) 
we just prove \Hi\ = sum(r//J, since \Hi\ = sum(r^.) can be proved analogously. We write 
the equivalence classes of the set {term(r|p) | p E Hi} modulo E = E_a more explicitly as 
{[ti],...,[y}. 

Note that Hi is the disjoint union {p ^ Hi \ terni(r|p) =e ti} U . . . U {p € Hi \ 
term(r|p) =e to}- Thus, \Hi\ equals \{p G Hi \ term(r|p) =e ti}| + ... + |{p G Hi 
term(r|p) =e ta}\- We conclude by observing that \{p € Hi \ term(r|p) =e ti}| = 
sum(rt^,Hj, • • • , \{p ^Hi \ term(r|p) =e ta}\ = sum(rt^,Hj hold. □ 

Lemma 5.21. Let i? : N x N — t- N 6e the computable function of Lemma \5.14\ Let A be 
a TABG^. Let a be the maximum arity of the symbols in the signature of A. Let n be the 
number of states of A. Let r be a run of A satisfying h{r) > B{a,n). Then, there is a 
global pumping on r. 

Proof. Consider the sequence {^H^^^yi^H^^ ));■•■) {'''Hiii'fj^). Note that the n-tuple (0,...,0) 
does not appear in the multisets of the pairs of this sequence. By Lemma [5.201 = 1 

and = hold, and for each i in {2, ... , /i(r)}, a ■ \Hi \ + \Hi\ > + |-ffi-i| holds. 

Moreover, for each i in {1, . . . , /i(r)}, \Hi\ = sum(r//-) and \Hi\ = sum{rfj_) hold. Thus, 
suiii(r/f^j^j ) = 1, s\im(rjj^^ ^) = 0, and for each i in {2, . . . , h{r)}, a ■ sum(r/f.) + sum(r^.) > 
s\im{rHi_i) + su.m{r ff, ^) hold. Hence, since h{r) > B{a,n) holds, by Lemma [5 . 141 there exist 
i,j satisfying h{r) > i > j > 1 and {rHi,rfj.) < {rHj,rj^_,). By Lemma [5.181 there exists 

a pump- injection / : (H^ U U Hi) — >• (Hj U Hj U Hj). Therefore, there exists a global 
pumping on r. □ 

Theorem 5.22. Emptiness is decidable for TABG^ . 

Proof. Let a be the maximum arity of the symbols in the signature of A. Let n be the 
number of states of A. Let r be an accepting run of A with minimum height. 

Suppose that h{r) > B(a, n) holds. Then, by Lemma[52Il there exists a global pumping 
r' on r. By Corollary [521 h{r') < h[r) holds. Moreover, by the definition of global pumping, 
r'(A) = r(A) holds. Finally, by Corollary 15. 11^ r' is a run of A. Thus, r' contradicts the 
minimality of r. We conclude that h[r) < B{a,n) holds. 
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The decidability of emptiness of A follows, since the existence of successful runs implies 
that one of them can be found among a computable and finite set of possibilities. □ 

Using Corollary 14.201 and Theorem 15.221 we can conclude the decidability of emptiness 
for TABG, and more generally for TABG[~, 96, N]. 

Corollary 5.23. Emptiness is decidable for TABG. 

Corollary 5.24. Emptiness is decidable for r^BG[«, 96, N]. 

6. Unranked Ordered Trees 

Our tree automata models and results can be generalized from ranked to unranked ordered 
terms. In this setting, S is called an unranked signature, meaning that there is no arity 
fixed for its symbols, i.e. that in a term a{ti, . . . , tn), the number n of children is arbitrary 
and does not depend on a. Let us denote by the set of unranked ordered terms over 

S. The notions of positions, subterms, etc., are defined for unranked terms of as for 

ranked terms of T(S). 

We extend the definition of automata for unranked ordered terms, called hedge au- 
tomata |Mur99j ■ with global constraints. We do not consider constraints between brothers 
nor flat theories in this setting. 

Definition 6.1. A hedge automaton with global constraints (HAG) over an unranked signa- 
ture S is a tuple A = {Q, S, F, A, C) where Q is a finite set of states, F C Q is the subset of 
final states, C is a Boolean combination of atomic constraints of the form q ^ q' or q ^ q', 
with q, q' G Q, and A is a set of transition rules of the form a(L) — )• q where a G S, g G Q 
and L is a regular (word) language over Q* , assumed given by a finite state automaton with 
input alphabet Q. 

We still use the notation HAG[ti, . . . ,r„] where the types Tj can be 96, \.\^, \\-\\n, N. 

The notion of run of TAG is extended to HAG in the natural way. A run of a HAG A is 
a pair r = {t, M) where t G UiTi) is an unranked ordered term and M is a mapping from 
Pos{t) into A^ such that for each position p G Pos{t) with n children, if M(p.l), . . . ,M{p.n) 
are rules with right-hand side states qi, . . . ,qn G Q_a, respectively, then M(p) is a transition 
rule of the form t{p){L) ^ g in A, and the word qi ■ ■ ■ qn belongs to L. Moreover, r |= C^, 
where satisfiability of by r is defined like in Section [3l A run r is called successful (or 
accepting) if r(A) G -F4. 

The emptiness decision results of Corollary 15.241 can be transposed from TAG into HAG 
using a standard transformation from unranked to ranked binary terms, like the extension 
encoding described in jCDG"'"07] . Chapter 8. 

Let us associate to the unranked signature S the (ranked) signature E@ := {a : | a G 
S} U {@ : 2} where @ is a new symbol not in S. The operator curry is a bijection from 
U(E) into T(S@) recursively defined as follows: 

curry(a) = a for all a G S 
curry(a(ti, . . . ,t„)) = @(curry(a(ii, . . . , i„_i)) , curry(i„)) 

An example of application of this operator is presented in Figure [71 We extend the appli- 
cation of the operator curry to sets of unranked ordered terms by curry(L) = {curry(f) | 
t G L}. 
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a ""^ curry @ 

b d f @ @ 

I / ^ / \ / \ 

c g h @ d @ h 

/ \ / \ 

a @ / <7 

Figure 7: Currying an unranked term. 

Proposition 6.2. For all HAG[^,^,N] A over S, one can construct effectively in PTIME 
a TAG[^,^,W\ A' over such that C{A!) = CMrry [C{A)) . 

Proof. Let A be (Q,S,F, A,C) more explicitly written. Without loss of generality, we 
assume that for each a G S, g € Q, the set of rules A contains exactly one transition of the 
form a{L) — )■ g, and we denote by Aa^q the NFA recognizing the corresponding language 
L. Recall that such automata have Q as input alphabet. Without loss of generality, we 
assume that the sets of states of A and all Aa^q are pairwise disjoint. Let Q be the union 
of all states of all the automata Aa^q. Intuitively, the transitions of the automaton A! will 
simulate both the transitions of A and the transitions of the NFAs Aa^q, when running on 
curry(t) for some t € lA{Ti). 

Let A' = {Q \J Q,Ti, F, A' ,C) where A' contains the following transitions for each 
a G S, g e Q: 

• a ^ q if Aa^q recognizes the empty word, 

• a ^ q where q is the initial state of Aa^q, 

• @{q,q') q' if there is a transition q q' in Aa^q, and 

• @{q, q') — >■ g if there is a transition q q' in Aa^q and q' is a final state of Aa^q. 

It is not difficult to see that there exists an accepting run of A if and only if there exists 
an accepting run of .4'. □ 

There exist alternative encodings from unranked to ranked trees in the literature, e.g., 
the first-child next-sibling encoding: see FigureOfor an example of this transformation. This 
alternative encoding makes the representation of equality and disequality between subterms 
of the original unranked term difficult, since the transformed subterms may have original 
siblings occurring now as their subterms. For example, in Figure [8l the two occurrences of 
the subterm c correspond to different terms in the result of the transformation. 

The following emptiness decision result is a direct consequence of Proposition 16.21 and 
Corollary KM 

Corollary 6.3. Emptiness is decidable for HAG[^,^,N]. 



7. Logics on Trees 

In this section, we discuss the application of our results to second order logics interpreted 
over domains defined by terms. We propose a strict extension of the second order monadic 
logic of the tree with equality, disequality and arithmetic constraints, and show that satis- 
fiability is decidable for this extension thanks to a correspondence with TAG[~, 7^,N]. 
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Figure 8: First-child next-sibling encoding of an unranked term. 



7.1. MSO on Ranked Terms. A ranked term t € T(S) over S can be seen as a model 
for logical formulae, with an interpretation domain which is the set of positions Pos{t). We 
consider monadic second order formulae interpreted on such models, built with the usual 
Boolean connectors, with quantifications over first order variables (interpreted as positions), 
denoted x,y . . . and over unary predicates (i.e. second order variables interpreted as sets of 
positions), denoted X,Y . . ., and with the following predicates, 

• equality: x = y, 

• membership: X{x), 

• labeling: a(x), for a G S 

• navigation: Si{x,y), for all i smaller than or equal to the maximal arity of symbols of S 
(we call -|-1 the type of such predicates), 

• term equality: X ^ Y, term disequality: X ^ Y (predicate types ^ and 96), 

• linear inequalities: X^Oj • > a or ^ • > a, where every and a belong to Z 
(predicate types \.\z and ||.||z)- 

We write MSO[ti, . . . ,rfc] for the set of monadic second order logic formulae with equality, 
membership, labeling predicates and other predicates of types ti, . . . , r^, amongst the above 
types ^, 96, and \.\z, \\-\\z- We also use the notations |.|n and for natural linear in- 
equalities (linear inequalities whose coefficient all have the same sign) and the abbreviations 
Z and N of Section H 

Let 3MS0[ri, . . . ,Tk] be the fragment of MSO[ri, . . . ,Tk] containing the formulae of the 
form 3Xi . . . 3X„ cp such that all the atoms of type ~, 9^, Z or N in (j) involve only second 
order variables amongst Xi , . . . , X„ . 

A variable assignment into a term t G T(S) is a function a mapping first order variables 
into positions of Pos{t) and second order variables into subsets of Pos{t). The satisfiability 
of a formula (phy a term t G T(S) and a variable assignment a, denoted t, a \= cj) is defined 
in the usual Tarskian manner, with: 
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t,a \= X = y iff a{x) = a{y) 

t,a^ X{x) iff aix) G a{X) 

t,a \= a{x) iff t{a{x)) = a 

t,a\=Si{x,y) iff a{x).i = a{y) 

t,a^X^Y iff ypea{X),p' ea{Y),p^p' : t\p = t\p, 

t,a^X^Y iff ypea{X),p' ea{Y),p^p' : t\pj^t\p^ 

t,a \= ^tti ■ \Xi\ > a iff aj • |fT(Xj)| > a 

t,a \= Y^tti ■ \\Xi\\ > a iff Yl^iUi ■ \{t\p \ p e a{Xi)}\ > a 

Example 7.1. Tlie following formula of 3MS0[~, 96] expresses that all the subterms headed 
by a in a term t are pairwise different: 3Xa {(yx Xa{x) a(x)) A Xa 96 Xa). In other 
words, a is used to mark monadic keys in t (see Example [32 



A seminal result of |TW68| shows that MS0[+1] has exactly the same expressiveness as 
TA, and therefore it is decidable. The extension MS0[+1, f«] is undecidable, see e.g. |FTT07] . 
The extension MS0[+1, \ .\z] is undecidable as well [K R02] . 

On the other side, the fragment 3MS0[+1, \.\z] is decidable |KR02j . and a fragment of 
3MS0[+1, ~, 96] is shown decidable in |FTT08] for a restricted variant of 96, using a two way 
correspondence between these formulae and a decidable subclass of TAGED. 

This latter construction can be straightforwardly adapted to establish a two way cor- 
respondence between 3MS0[+1, ^, 96, N] and TAG[«, 96, N]. 

Theorem 7.2. 3MS0[+1, ~, 96, N] is decidable on ranked terms. 

Proof. Following the same proof scheme as |FTT08| , we show that for every closed formula 
(j) in 3MS0[+1, ~, 96, N], we can construct a TAG[~, 9^, N] recognizing exactly the set of models 
of (f). Then, the decidability of the logic follows from Theorem 15.241 
Without loss of generality, we may assume that (p is of the form 

3X1 .. . 3Xn (MX) A MX) A MX)) 

where (po{X) is a MS0[+1] formula with free variables X = Xi, . . . , X^, and (p~{X) and 
(l)fii{X) are Boolean combinations of atoms of the respective form Xi w Xj, Xi 96 Xj and 
J2 o-i ■ \Xi\ > a, X] '^j ■ ll^ill ^ ^- Moreover, we shall also assume that 0~(X) and (I)n{X) are 
conjunctions of atoms or negations of atoms of the above form. Otherwise, we put them 
into disjunctive normal form and then split <j) into an equivalent formula (^1 V ... V 0^ , where 
each (pi, i < k, is of the form requested: = 3Xi . . . 3X„ (</)q(X) A 4>'L{X) A (j)\^{X)), where 
(^q(X) e MS0[+1] and 4''L{X) and (l)\[X) are conjunctions of atoms or negations of atoms 
as above, and we solve satisfiability separately for each (pi. 

First, we recall the definitions of |TW68j of the signature S x {0, 1}", where the arity of 
a symbol (/, 61, . . . , bn) is the arity of /, and of the term t®a over this signature obtained, 
from a term t over S and a mapping a : {Xi,...,Xn] 2P°'^\ by relabeling every 
position p G Pos{t) by 61, ... , bn)., where for each i < n, 6j = 1 if p € (y{Xi) and 6j = 
otherwise. Also, from |TW68j we get the construction of a TA ^0 = (Q, ^ x {0, 1}", F, Aq) 
which recognizes the set of terms {t^cr G T(S x {0,1}") | t, cr ^ (koiX)}. 

Second, following a construction in [NPTTOS] . we shift in ^0 the bit-vectors from the 
signature into the state symbols, obtaining a TA = (Q x {0, 1}", S, F x {0, l}*^. A) where 
A contains all the transition rules 

bi,. . . ,bn) 
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such that / e S, (/, 61, ... , 6„) {qi, ... , Qm) q e Ao and 61,1, . . . ,bi 

,71) • • • ) bm,l: • • • ) bm,n £ 

{0, 1}. This automaton recognizes the projection (on the first components) of the 
terms recognized by i-e- it recognizes the set of terms t € T(5]) such that there exists 
C7 : {Xi, . . . , X„} ^ satisfying t,a^ ^X). _ _ 

Third, we obtain a constraint C by rewriting all the atoms of (t)~{X) A (t)n{X) with the 
following rules: 

^ A i9^bi,...,bn) ^ {q',b[,...,b'^) 

bi=b'-=l 

^ /\ {q,bi,...,bn) ^ {q',b\,...,b'J 

b.=b'.=i 

^ ^^ai-\{q,bi,...,bn)\>a 

i bi=l 

^ ^ ^ ai • . . . > a 

i bi=l 

The TAG[Ri,96,N] A = {Q x {0,1}",S,F x {0,1}", A, C) recognizes {t € CiA) \t\=(l)}. □ 

The above transformation also works in the other direction (this result is not necessary 
for the proof of Theorem 17.21 though): for every TAG[~, 96, N], we can construct a formula <j) 
in 3MS0[+1, Ri, 96, N], whose set of models is C{A). 

Note that 3MS0[+1, ~] is strictly more expressive than MSO, since the equality between 
subterms is not expressible in MSO (see e.g. [CDG"'"07j ). The TA construction of [TW68| 
for the decidability of MS0[+1] involves the closure under projection on components for TA 
languages over signatures made of tuples of symbols (for the elimination of 3 quantifiers). 
TAG languages are not closed under projection on some components of tuples, as it is already 
the case for simpler form tree automata with equality [TreOOj . Thus, the same approach 
cannot be used to prove decidability of emptiness of TAG. 

7.2. MSO on Unranked Ordered Terms. In unranked ordered terms of U{T,), the 
number of children of a position is unbounded. Therefore, for navigating in such terms with 
logical formulae, the successor predicates Si{x,y) of Section [7T] are not sufficient. In order 
to describe unranked ordered terms as models, we replace these above predicates Si by: 

• Si{x,y) {y is a child of x), 

• S^{x,y) {y is the successor sibling of x). 

The type of these predicates is still called +1. Note that the above predicates 81,82, ■ ■ ■ 
can be expressed using these two predicates only. 

The satisfiability of the above atoms by a term t € ^(S) and a variable assignment a 
is defined as follows: 

t,a \= 8i{x,y) iff there exists i such that a{x).i = a{y), 
t,a \= 8^{x, y) iff there exists p G Pos{t) and i such that a{x) = p.i 
and a{y) = p.{i + 1). 

It is shown in |SSM03j that the extension MS0[+1, \ is undecidable for unranked ordered 
terms when counting constraints are applied to sibling positions. 

Using the results of Section [6l and an easy adaptation of the automata construction in 
the proof of Theorem 17. 2^ we can generalize Theorem 17.21 to 3MS0 over unranked ordered 
terms. 
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Theorem 7.3. 3MS0[+1, 56, N] is decidable on unranked ordered terms. 

8. Conclusion 

We have answered (positively) the open problem of decidability of the emptiness problem 
for the TAGED [FTT08] , by proposing a decision algorithm for a class TABG of tree automata 
with global constraints strictly extending the global constraints of TAGED in several direc- 
tions. Moreover, the TABG combine the global constraints with local tests between brother 
subterms a la [BT92j and equality interpreted modulo flat theories. Our method for empti- 
ness decision, presented in Section [5] appeared to be robust enough to deal with several 
extensions like global counting constraints, and generalization to unranked terms. 

A challenging question would be to investigate the precise complexity of the emptiness 
problem, avoiding the use of Higman's Lemma in the algorithm. For instance, in |FTT08| . it 
is shown, using a direct reduction into solving positive and negative set constraints |CP94t 
I(;TT941 ISte94j . that emptiness is decidable in NEXPTIME for TAGED (i.e. for TAG^[^] 
modulo an empty theory and such that in every atomic constraint q ^ q' , q and q' are 
distinct states). On the other hand, the best known lower bound for emptiness decision for 
TABG is EXPTIME-hardness (this holds aheady for TAG^[«] as shown in |FTT08| ). 

Another interesting problem mentioned in the introduction is the combination of the 
HAG of Section [6] with the unranked tree automata with tests between siblings, UTASC |WL071 
ILW09] . Perhaps, the techniques of Section [5] could help for the emptiness decision for a 
formalism using for instance MSO binary querying (following e.g. [NPTTOS] ) for selecting 
the test position of global constraints. 

Finally, another branch of research related to TABG concerns automata and logics for 
data trees, i.e. trees labeled over an infinite (countable) alphabet (see |Seg06| for a survey). 
Indeed, data trees can be represented by terms over a finite alphabet, with an encoding of 
the data values into terms. This can be done in several ways, and with such encodings, 
the data equality relation becomes the equality between subterms. Therefore, this could be 
worth studying in order to relate our results on TAG to decidability results on automata or 
logics on data trees like those in [JL071 [BMSL09] . 
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